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Keep a welcome 


The plight of a record number of refugees is something the West cannot ignore. Humanitarian 
values should be upheld, and people fleeing war and persecution must be offered protection. 


can be worth a thousand articles or opinion pieces. Academics 

and humanitarian organizations have long battled to debunk the 
vicious myths and disinformation that often surround the refugee 
issue, and to counter often fact-free government policies — to little 
effect. It tooka single iconic and heartbreaking image ofa three-year- 
old Syrian boy, Aylan Kurdi, washed up drowned on a beach in Turkey 
for the world’s conscience to wake up to the plight of refugees. 

Science and other academic interests have a long tradition of offer- 
ing both refuge and professional hope to displaced people. Almost 
every discipline has its own story of influential figures in the field 
who arrived with oppression and conflict snapping at their heels. This 
journal has long chronicled and supported such efforts. In June 1939, 
for example, Nature published a three-page editorial that concluded 
that if Britain relaxed its “exceedingly cautious” attitude to accepting 
refugees, then this would not only defend humanitarian values and 
academic freedom, but also “might prove in the long run to be wise 
and sound from the economic point of view”. 

What is there to say in 2015? Worldwide, there are some 60 million 
refugees, up from 37.5 million a decade ago — the biggest refugee crisis 
since the Second World War. Yet the humanitarian response so far has 
been largely inadequate. The shrill rhetoric in many European Union 
and other wealthy nations claiming an ‘invasion’ of refugees doesn't 
stand up to scrutiny. Four million refugees have fled Syria since the con- 
flict began there in 2011, but last year, the United Kingdom accepted 
4,500 Syrian refugees, or just 0.007% of the UK population. Among 
the more generous EU countries, Germany took in 40,000 and Sweden 
34,000 — the United States took only 4,750. By contrast, over the same 
period, Turkey temporarily accepted 1.5 million, and Lebanon, a tiny 
country of just 4.5 million people, took in some 1.15 million refugees, 
or 26% of its population. 

EU refugee lawis a mess. For refugees to apply for asylum, they must 
first reach a territory outside their own country. But the EU, and other 
countries, have increasingly sought to circumvent international refugee 
law by introducing rules to keep refugees out and so prevent them from 
applying in the first place. 

A pernicious 2001 EU directive, for example, erects a barrier by 
imposing fines and the costs of repatriating illegal immigrants on 
airline, train, shipping and other carriers, essentially shifting the 
responsibility for deciding who is a legitimate refugee and who is an 
illegal migrant from governments to the carrier companies. Predict- 
ably, carriers have refused to accept passengers who lack visas. This 
fortress-Europe mentality explains why, this year alone, more than 
300,000 people have embarked on perilous crossings of the Mediter- 
ranean — with 2,600 perishing — instead of taking a commercial ferry 
or airliner to apply for asylum. 

There is also no EU-wide asylum status, with decisions on applica- 
tions left to each member state, and no mutual recognition of positive 


lE the refugee crisis facing Europe and the Middle East, an image 


outcomes by countries. And the seriously flawed ‘Dublin Regulation’ 
also obliges the EU member state in which a refugee first arrives to take 
the refugee’s asylum application. This has resulted in frontier countries 

such as Greece and Italy bearing a hugely disproportionate burden. 
The rule also frustrates applicants who have a legitimate preference 
for a specific country, for example to join their extended family. This 
encourages irregular movement within the EU, and allows other mem- 
ber states to forcibly return refugees to their 


“The EU, and first port of call — so turning what should be 
other countries, a humanitarian exercise into one of excessive 
have increasing ly coercion and criminalization. 

sought to In August, Germany’s Chancellor Angela 
circumvent Merkel rightly suspended her country’s 
international adherence to the Dublin Regulation, and 
refugee law.” called for a radical, permanent EU-wide 


system of processing asylum applications, 
with an enforced distribution of refugees throughout EU member 
states. Merkel last week courageously stated that Germany itself can and 
will cope with its inflow of refugees, an expected 800,000 this year. The 
proposal is vigorously opposed by some member states, in particular 
the Czech Republic, Poland, Hungary and Slovakia. 

The public outcry following the photograph of Aylan has given the 
proposal new momentum, with Francois Hollande, the French presi- 
dent, last week lending his support, and also the United Nations. The 
EU will formally discuss the proposal on 14 September — it should be 
embraced as long-overdue reform. 

The scientific community must also play its part. It is in everyone's 
interest for refugee students and academics to be given opportunities 
to continue their careers, because otherwise, the Middle East and else- 
where risks losing a generation of talent. The Western academic com- 
munity must boost efforts to welcome refugee academics and students. = 


Money matters 


Itis not how much people have, it is how much 
we know they have that stokes inequality. 


best to run a society. Governance would be a simple optimization 
problem, like finding the shortest route through a network; we 
could do without left-right political confrontation, and just solve the 
equations. Unfortunately, governance is not a well-posed problem. 
There must inevitably be balance and compromise: for example, of the 
rights of the individual against the overall good for society. This is what 


I would be so convenient if fundamental laws of nature told us how 
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makes politics and economics not just controversial, but interesting. 

Inequality is one of the biggest items on the agendas of both of these 
disciplines. Few people are likely to speak in favour of inequality as such, 
but in stereotypical terms the political right defends wealth as a reward 
for hard work, whereas the left deplores a society in which, as economist 
Joseph Stiglitz has said of the United States, “1 percent of the people take 
nearly a quarter of the nation’s income”. It seems an unavoidable truth 
that a free-market capitalist system will create wealth inequality; toa 
free-market fundamentalist who sees markets as meritocratic optimiz- 
ers of efficiency and resource utilization, that is not only necessary but 
moral. Under that philosophy, by intervening in the market in the hope 
of making the outcome ‘fairer, we only throw a spanner in the works. 

Yet even if one accepts some inequality as a necessary evil, there are 
options beyond laissez-faire. How, and how strenuously, governments 
and legislators should attempt to limit the extent of wealth inequality — 
crudely measured by the Gini coefficient, which quantifies the statistical 
dispersion of income distribution — is currently a hotly disputed matter. 
Should companies and banks be restricted in what they can pay their 
chief executives? Should taxes aim to inhibit or reduce the perpetuation 
of inherited wealth? Or is all this crypto-communist social engineering? 

The strongest argument for such measures is not that it makes things 
more ‘fair (although meritocratic defences of free-market inequali- 
ties should surely at least demand a level playing field). Rather, it is 
that gross wealth inequality is socially corrosive. It polarizes atti- 
tudes, foments unrest (see, for example, the Occupy movement) and 
degrades trust and cooperation. At face value, a study published online 
this week in Nature supports that view — but with an added twist. 

In the study, groups of volunteers played a simple economic game 
involving cooperation (a “public goods game”), in which they could 
lose or gain wealth through voluntary redistribution within social net- 
works that started with three different levels of inequality (A. Nishi 
et al. Nature http://dx.doi.org/10.1038/nature15392; 2015). Crucially, 
in some games the wealth of participants was made visible to others, 
whereas in others it was kept hidden. For “invisible” wealth condi- 
tions, the games tended to converge on a fairly low Gini coefficient, 


but “visible” wealth produced higher (and less stable) average Gini 
coefficients. This result was exacerbated when the initial inequality 
was greater. In other words, simply hiding wealth decreased the wealth 
disparity in otherwise identical games and networks. 

Still more importantly, visible wealth reduced the overall coopera- 
tion and interconnectedness of the social network, and in fact led to 
lower total wealth. As the authors say: “it is not inequality per se that 
is so problematic, but rather visibility” of that 


“Ine quality inequality. This fits with the established idea 
is not solely that it is relative, not absolute, differences in 
downtomarket —_wealth that compromise happiness and pro- 
mechanisms, but — mote discord: we resent what our neighbours 
also responds have and we don't. What grates is not know- 
in subtle ways ing that others have more than us, but seeing 
to our own that difference ostentatiously displayed. 


It is dangerous, however, to think that 
these laboratory experiments can be extrapo- 
lated into a political or moral message for the real world. They invite us 
to frown on bling and the champagne-drenched excesses of financiers, 
but we should be cautious about their implications, even (or espe- 
cially) if they flatter our preconceptions. Besides, there is scope here 
for upsetting both ends of the political spectrum. Right-wingers might 
deplore an injunction to hide one’s wealth, compromising personal 
freedom — isn't it up to us how we spend our money? Left-wingers 
might dislike the idea of being relaxed about inequality as long as it is 
kept out of sight — and, anyway, might that not provoke a climate of 
secrecy and suspicion? 

For now, the results should simply inform and broaden the dis- 
cussion. They show, for example, that inequality is not solely down 
to market mechanisms, but also responds in subtle ways to our own 
dispositions. Above all, the findings are a reminder, along with related 
behavioural experiments on the role of punishment in public-goods 
games, that John Maynard Keynes's “animal spirits” are an irreducible 
part of what shapes a market economy. It is time to lay the idea of the 
rational Homo economicus to rest. m 


dispositions.” 


Loaded language 


There can be more to a question than appears 
at first sight. 


illiam Burroughs, the infamous US writer and author of 
Wwe Lunch, had a typically counter-culture approach to 

seeking knowledge: “Your mind will answer most questions 
if you learn to relax and wait for the answer.” 

If only it were that easy for the rest of us. Instead, to ask a ques- 
tion is harder than it might seem. British Prime Minister David 
Cameron discovered this last month when the UK Electoral 
Commission told him to change the wording of a proposed question 
for the country’s referendum on membership of the European Union. 

Cameron's suggestion — “Should the United Kingdom remain a 
member of the European Union?” — wasa classic example of what lin- 
guists call acquiescence bias. Take the Burroughs route and relax, and 
the answer to such a question that comes to mind more often than not 
is to stick with the status quo. Rejecting something is more difficult. 

If that was Cameron's intention, then his plan has been rumbled. 
The question will now have the extra clause at the end: “or leave the 
European Union?” To answer that one, citizens must now make more 
of a cognitive effort, and that should remove the chance for bias. 

Cameron's linguistic nudging was more subtle than most attempts to 
bias questions. Lawyers and politicians tend to be fans of more explicit 
tricks of language. There is the classic loaded question — when did you 
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stop beating your wife? — which presupposes guilt; and the pernicious 
influence of the hypothetical question. During the 2000 US election 
campaigns, South Carolina voters were asked: would you be more 
likely or less likely to vote for John McCain for president if you knew 
he had fathered an illegitimate black child??” 

Researchers have found that the way a question is phrased can alter 
how people remember incidents. Witnesses asked how quickly cars 
were travelling when they “smashed” are more likely to imagine that 
they saw broken glass on the ground than others told that the vehicles 
simply “bumped” into each other or “collided”. They were also more 
likely to say that the cars were travelling at higher speed. 

Scientists have a particular relationship to questions. Turned into 
testable null hypotheses, questions are at the heart of the scientific 
method. Allied with proper experimental design and robust statistical 
analysis, they can be answered with confidence — or not. 

Some answers are known before the question is asked; other 
questions are genuine calls for information. Some want to benefit 
the questioner and others to empower those who answer it. How to 
judge? In all areas — politics and science included — the best ques- 
tions are simple and to the point. So who knows what the residents of 
Quebec thought when confronted with the following for their refer- 
endum on independence in 1995: 

“Do you agree that Quebec should become sovereign, after hav- 
ing made a formal offer to Canada for a new economic and political 
partnership, within the scope of the bill respect- 
ing the future of Quebec and of the agreement 
signed on June 12, 1995?” 

The ‘No’ vote won with 50.6%. ‘Don’t know’s 
were not recorded. m 
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WORLD VIEW  jennisicos son 


(including Nature), are acknowledging that a culture of science 
focused on rewarding eye-catching and positive findings may 
have resulted in major bodies of knowledge that cannot be reproduced. 
Private-sector, academic and non-profit groups are leading multiple 
efforts to replicate selected published findings, and so far the results do 
not make happy reading. Several high-profile endeavours have been 
unable to reproduce the large majority of peer-reviewed studies that 
they examined. Meanwhile, the US National Academies is preparing 
to publish a high-profile report on scientific integrity that will flag 
irreproducibility as a key concern for the research enterprise. 

As the spotlight shines on reproducibility, uncomfortable issues will 
emerge at the interface of research and ‘evidence-based’ policy. 

Consider, for example, the Secret Science Reform Act of 2015, a US 
bill that would “prohibit the Environmental 
Protection Agency from proposing, finalizing, 
or disseminating regulations or assessments 
based upon science that is not transparent or 
reproducible” Passed in March by the House 
of Representatives essentially along party lines 
(Republicans in favour, Democrats opposed) 
and now awaiting action by the Senate, the bill 
has been vigorously opposed by many scientific 
and environmental organizations. 

They argue, probably correctly, that the bill's 
intent is to block and even roll back environ- 
mental regulations by requiring that all data 
on which the rules are based be made publicly 
available for independent replication. One of 
the main objections is that a lot of the scientific 
research that informs regulatory decisions is not of the sort that can 
be replicated. For example, a statement of opposition from numer- 
ous scientific societies and universities explains that: “With respect to 
reproducibility of research, some scientific research, especially in areas 
of public health, involves longitudinal studies that are so large and of 
great duration that they could not realistically be reproduced. Rather, 
these studies are replicated utilizing statistical modeling” 

Precisely. Replication of the sort that can be done with tightly con- 
trolled laboratory experiments is indeed often impossible when you are 
studying the behaviour of dynamic, complex systems, for example at the 
intersection of human health, the natural environment and technologi- 
cal risks. But it is hard to see how this amounts to an argument against 
mandating open access to the data from these studies. Growing concerns 
about the quality of published scientific results have often singled out 
bad statistical practices and modelling assump- 
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WILL INCREASINGLY 
BECOME A MATTER OF 


POLITICAL 
INTERPRETATION. 


Reproducibility will not 
cure what ails science 


A billto make data for environmental regulation more transparent reveals the 
fuzzy boundary between science and ideology, argues Daniel Sarewitz. 


Although concerns about the bill’s consequences are reasonable, the 
idea that it would be bad to make public the data underlying environ- 
mental regulations seems to contradict science’s fundamental claims 
to objectivity and legitimacy. In June, a commentary in Science by an 
array of leading voices, including the current and future heads of the 
National Academies, flagged “increased transparency” and “increased 
data disclosure” as crucial elements of science’s “self-correcting norm” 
that can help to address “the disconcerting rise in irreproducible find- 
ings” (B. Alberts et al. Science 348, 1420-1422; 2015). This is more 
or less the position taken by the Secret Science bill’s sponsor, Rep- 
resentative Lamar Smith (Republican, Texas): “The bill requires the 
EPA to use data that is available to the public when the Agency writes 
its regulations. This allows independent researchers to evaluate the 
studies that the EPA uses to justify its regulations. This is the scientific 
method.” 

This battle for the soul of science is almost 
surreal in its avoidance of the true issue, which 
is ideological. One side believes that the govern- 
ment should introduce stricter environmental 
regulations; the other wants fewer restrictions 
on the marketplace. Science is the battleground, 
but it cannot adjudicate this dispute. At its core, 
the disagreement is about values, not facts. But 
just as importantly, the facts themselves are 
inevitably incomplete, uncertain, contested 
and, as we have been learning, often unreliable. 

Like a divorced couple bitterly fighting over 
the custody of their child, both sides in the 
Secret Science debate insist that they have only 
the interests of science at heart. Republicans 
are using a narrow, idealized portrayal of science — that it produces 
clear and reproducible findings — as a weapon to undercut environ- 
mental and public-health regulation of the private sector. But many 
scientists, environmentalists and Democrats have long used similar 
portrayals to justify the same regulations, and to bash Republicans as 
anti-scientific when they did not agree. 

More and more, science is tackling questions that are relevant to 
society and politics. The reliability of such science is often not testable 
with textbook methods of replication. This means that quality assur- 
ance will increasingly become a matter of political interpretation. It 
also means that the ‘self-correcting norm that has served science well 
for the past 500 years is no longer enough to protect science’ special 
place in society. Scientists must have the self-awareness to recognize 
and openly acknowledge the relationship between their political con- 
victions and how they assess scientific evidence. m 


Daniel Sarewitz is co-director of the Consortium for Science, Policy and 
Outcomes at Arizona State University, and is based in Washington DC. 
e-mail: daniel.sarewitz@asu.edu 
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Selections from the 
scientific literature 


RESEARCH HIGHLIGHTS 


Acrack in the 
standard model? 


A signal from the Large 
Hadron Collider (LHC) could 
challenge the standard model 
of particle physics for how 
matter and forces interact. 

The LHCb experiment 
at CERN, Europe’ particle- 
physics lab near Geneva, has 
uncovered an unexpected 
difference in the rate at which 
short-lived particles called 
B mesons undergo certain 
decays into muons and 
taus (heavier cousins of the 
electron). The standard model 
says that once the particles’ 
mass differences are taken into 
account, the decays should 
occur at exactly the same rate. 

The deviation is small, and 
the chance that it is a statistical 
fluctuation in random noise is 
too high to claim a discovery 
(the significance is 2.1 sigma, 
but physicists’ threshold 
for a discovery is 5 sigma). 
However, the results are 
intriguing because they match 
previous measurements made 
by two other experiments 
elsewhere. 
Phys. Rev. Lett. (in the press) 


IMMUNOLOGY 


Odd fish use old 
immune trick 


Mice and lamprey fish produce 

a similar antibody response 

to influenza, despite being 

separated by hundreds of 

millions of years of evolution. 
Lampreys (pictured) are 

jawless fish whose common 


ANIMAL BEHAVIOUR 


Seabirds duped by plastic waste 


It is likely that most seabirds have consumed 
plastic rubbish floating in the ocean after 


mistaking it for prey. 


Chris Wilcox at the Commonwealth Scientific 
and Industrial Research Organisation in Hobart, 
Australia, and his colleagues collated published 
data on the diets of 135 seabird species over 
the past four decades, including the red-footed 
booby (Sula sula; pictured) and the Cape petrel 


(Daption capense). 


ancestor with mammals 
lived 550 million years ago. 
They defend themselves with 
antibodies that are unlike those 
produced by the immune 
systems of jawed vertebrates. A 
team led by Jonathan Yewdell 
at the National Institute of 
Allergy and Infectious Diseases 
in Bethesda, Maryland, 
exposed lamprey larvae to 
inactivated influenza virus 
and found that their blood 
cells produced antibodies that 
recognize key amino-acid 
sites on the head of 
the haemagglutinin 
protein of influenza. 
This is the same 
region as that targeted 
by influenza antibodies 
from mice, suggesting that 
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According to the data, the proportion of birds 
that had eaten plastic increased by about 1.7% 


per year. Using this figure, the team predicts that, 


had these studies been done today, more than 
90% of the seabirds would have eaten plastic. By 
2050, that could reach 99% if the flow of plastic 
waste to the seas is not reduced. The researchers 
found that the area of highest risk was in the 
Tasman Sea between Australia and New Zealand. 


Proc. Natl Acad. Sci. USA http://doi.org/7dv (2015) 


lamprey and mouse antibodies 
recognize pathogens ina 
similar way despite their huge 
evolutionary separation. 

eLife 4,e07467 (2015) 


Basque ancestors 
were farmers 


The ancestors of people from 
the Basque region of Spain were 
early farmers — not hunter- 
gatherers as was thought. 
Farming practices emerged 
around 11,000 years ago in the 
Near East and later spread to 
Europe as people migrated in 
waves, eventually replacing 
the hunter-gatherer lifestyle. 
To study this influx in Iberia, a 


team led by Mattias Jakobsson 
at Uppsala University in 
Sweden sequenced the 
genomes of 8 individuals from 
remains found in a cave in 
northern Spain. These people 
lived 5,500-3,500 years ago, 
after the arrival of Spain's first 
farmers around 7,000 years ago. 
The closest living descendants 
of the sequenced people are 
modern Basques, contradicting 
past studies that linked Basques 
to late hunter-gatherer groups. 
The Basque language 
is distinct from all other 
European tongues, and the 
authors say that it could bea 
relic of Spain’s first farmers. 
Proc. Natl Acad. Sci. USA 
http://dx.doi.org/10.1073/ 
pnas.1509851112 (2015) 
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ECOLOGY 


Predator biomass 
no match for prey 


Twice as much prey does not 
lead to twice as many predators, 
according to Ian Hatton at 
McGill University in Montreal, 
Canada, and his colleagues. 
In theory, more prey 
should mean more predators. 
The authors analysed 
published data on biomass 
and numbers of individuals 
for 2,260 ecosystems in 
1,512 locations worldwide. 
They found that the biomass of 
predators increased along with 
the biomass of their prey, but 
that the ratio of the two values 
decreased. Across ecosystems 
from grassland to oceans, the 
ratio scales with an exponent of 
around 0.75, rather than scaling 
by 1 as ina linear relationship. 
Similar scaling laws are 
well known between an 
organism's body mass and 
features such as metabolism, 
growth and reproduction, 
but had not been identified 
across whole ecosystems, say 
the researchers. This indicates 
an unappreciated degree of 
ecosystem organization. 
Science http://doi.org/7f3 (2015) 


How colonies of 
sea animals swim 


The animals that make up 
a colony of sea creatures 
contribute to its motion 
depending on their size. 
Complex colonies called 
physonect siphonophores 
— relatives of jellyfish — are 
formed of many individuals 
that propel the colony using 
jets of water. John Costello at 
Providence College in Rhode 
Island and his team collected 
colonies of one physonect 
species (Nanomia bijuga) 
and photographed them 
as they swam. Individuals 
worked together to drive 
the colony around, but did 
not all contribute equally. 
Smaller, weaker colony 
members steered the 
swimming colony, and their 
more powerful neighbours 


provided the thrust. 
Nature Commun. http://dx.doi. 
org/10.1038/ncomms9158 (2015) 


Devil tumour type 
affects survival 


Some lineages of the infectious 
facial tumours that are 
devastating populations of the 
Tasmanian devil (Sarcophilus 
harrisii) can result in worse 
outcomes for animals. 

Rodrigo Hamede at the 
University of Tasmania in 
Hobart, Australia, and his 
colleagues have monitored 
the outbreak of devil facial 
tumour disease ata site in 
northwestern Tasmania since 
2006. Animals at this site 
initially had higher survival 
rates than other infected 
populations and a lower 
proportion of infected animals 
overall. Their tumours were 
found to have four sets of 
chromosomes. 

Around 2011-12, this 
‘tetraploid’ tumour lineage 
was replaced by a diploid’ type 
with two sets of chromosomes, 
which the authors found was 
associated with an increased 
disease prevalence in adults 
(from around 25% of animals 
infected to 80%) anda 
significant population decline. 
Tumour variance can shape 
both epidemic patterns and 
outcomes, the authors warn. 
Proc. R. Soc. B 282, 20151468 
(2015) 


Ring-shaped trap 
holds ions in check 


An electromagnetic trap can 
suspend 400 ions at a time, 
providing a useful system 
for studies of quantum 
information processing. 

Ions can be pinned in place 
using oscillating and static 
electric fields, but trapping 
large numbers is challenging 
because experimenters must 
compensate for unavoidable 
background fields at each 
ion location. Daniel Stick at 
Sandia National Laboratories 
in Albuquerque, New Mexico, 


RESEARCH HIGHLIGHTS BiiiSaiaa¢ 


SOCIAL SELECTION 


Popular topics 
on social media 


Journal of ideas, data and more 


With so many journals already in existence, it is rare for a new 
title to draw attention. But researchers and publishing experts 
are taking notice of Research Ideas and Outcomes, or RIO, an 
open-access journal that launched on 1 September (http:// 
rio.pensoft.net). As well as standard articles, it will publish 
proposals, experimental designs, data and software, and aims 
to cover research from all stages of the research cycle. Kelly 
Visnak, a scholarly-communications librarian at the University 
of Texas at Arlington, tweeted: “This Open Journal is a game 
changer” Stephen Curry, a structural 
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and his colleagues used 
88 electrodes to measure 
and then suppress undesired 
electric fields at points 
around a ring-shaped trap. By 
compensating for background 
fields in just one direction, 
they succeeded in trapping 
400 calcium ions at uniform 
intervals around the circle. 
Most proposed quantum 
devices arrange ions in square 
lattices, but circular chains of 
ions could find uses in new 
designs as well as in quantum 
simulators, say the authors. 
Phys. Rev. Appl. 4,031001 (2015) 


Giant virus from 
permafrost 


A new species of ‘giant’ virus 
has been revived from a 
30,000-year-old sample of 
Siberian permafrost. 

The first giant virus 
visible with light microscopy 
was seen in 2003. Several 
species have been discovered 
since, including Pithovirus 
sibericum (pictured right) 
found in permafrost in 2014. 


biologist at Imperial College London, 
pondered on Twitter whether open access 
was now driving “the most innovative & 
important” developments in publishing. 


Jean-Michel Claverie and 
Chantal Abergel at the CNRS 
Institute of Microbiology of the 
Mediterranean in Marseille, 
France, and their team have 
now isolated another giant 
virus from the same piece of 
permafrost. 

They found the virus — 
named Mollivirus sibericum 
(pictured left) and seen as 
spherical particles around 
500-600 nanometres in 
size — multiplying in cultures 
of amoebas inoculated with 
the permafrost. Its genome 
is a double-stranded DNA of 
651,523 base pairs, which is 
unusually devoid of repeats. 
The authors say that such 
viruses are probably not 
rare, and that forms that 
could infect humans may 
be reawakened as mining 
and drilling become more 
common in the Arctic. 

Proc. Natl Acad. Sci. USA 
http://dx.doi.org/10.1073/ 
pnas.1510795112 (2015) 
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SEVEN DAYS 


| __BUSINESS 
Stem-cell safety 


Asterias Biotherapeutics 
reported cautious good 

news from early-stage trials 
ofan embryonic-stem-cell 
treatment for the most severe 
forms of spinal-cord injury on 
31 August. Three people with 
injuries that left them with no 
feeling from the neck down 
were injected with low doses 
of oligodendrocytes — cells 
derived from embryonic stem 
cells and that support nerve 
growth. None experienced 
serious side effects, and the 
first participant, operated 

on in June, showed minor 
improvements in sensory 
function. The company, based 
in Menlo Park, California, 
plans to increase the number 
of cells in a dose, hoping to 
increase the effect. 


Ethics revisited 


US agencies are working 

to update ethics rules that 
regulate biomedical research 
on humans. On 2 September, 
the US Department of Health 
and Human Services (HHS) 
announced a proposal to 
revise the ‘Common Rule’ 
policy, which governs human- 
subjects research at 18 US 
federal departments and 
agencies. The proposal calls 
for a single ethical review of 
research conducted at multiple 
sites, and more-stringent 
consent procedures for the 
use of specimens donated 

to biobanks. The HHS 
proposal opened for a 90-day 
public comment period on 

8 September. See go.nature. 
com/cbd53s for more. 


Anthrax review 

The US Secretary of the Army 
has ordered a safety review of 
all nine Department of Defense 
laboratories that handle 
dangerous biological agents. 


1 August 1997 


The news in brief 


Sea surface temperature 
anomaly (°C) 


TAO 2 a 


1 August 2015 


EI Nifio on track to be record-breaking 


A comparison of the El Nifio weather pattern in 
1997 and 2015 shows how the two had developed 
ina strikingly similar fashion by August in each 
of the years. Sea-surface temperature data from 
the National Center for Atmospheric Research in 
Boulder, Colorado, shows each El Nifio as a band 
of warmer-than-usual water (orange) along the 


The 2 September order follows 
a problem discovered during 
an ongoing investigation into a 
military lab at Dugway Proving 
Ground in Utah, which in 

May accidentally shipped 

live anthrax to a commercial 
lab. The current investigation 
found the bacteria outside the 
normal containment area at 
Dugway, although still within 
the confines of the enclosed lab 
used for dangerous agents. Labs 
must deliver their safety reports 
within 10 days of the order. 


Climate appeal 
The Dutch government will 
appeal a landmark ruling 
on its climate policies, it 
announced on 1 September. 
In June, a district court in 
The Hague declared that the 
Netherlands must reduce its 
domestic greenhouse-gas 
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emissions by at least 25% by 
2020, relative to 1990 levels. 

At present, the Netherlands 

is on track to achieve a 17% 
reduction in emissions by 
2020, relative to 1990, in 

line with European Union 
obligations. A letter from the 
Ministry of Infrastructure and 
Environment to the chairman 
of the House of Representatives 
argued that the court's verdict 
might be incompatible with 
international law. The deadline 
for appeal is 24 September. 


CRISPR endorsed 


Five leading UK research 
organizations have backed 
work on human-genome 
editing. The consortium, which 
includes the Wellcome Trust 
and the Medical Research 
Council, wants to see further 
debate on the ethics of using 


eastern equatorial Pacific Ocean — with added 
warmth to the north in 2015. The 1997 event was 
the strongest in recent memory; the 2015 one 
now seems as if it could break that record (see 
Nature http://doi.org/7h6; 2015). The peak of the 
event is forecast for late autumn or early winter in 
the Northern Hemisphere. 


gene-editing technologies 
such as CRISPR/Cas9, it said 
in a2 September statement. 
These technologies are not yet 
ready for clinical trials, but the 
group says that it will continue 
to fund and support them. In 
the United Kingdom, genome- 
editing research is limited at 
present to non-reproductive 
cells and human embryos less 
than 14 days old. 


} RESEARCH 
Super Stonehenge 


Researchers have discovered a 
5,000-year-old row of at least 
90 stones at the prehistoric 
monument of Stonehenge 
near Salisbury, UK. Found 
within 3 kilometres of the 
famous stone circle using 
non-invasive technologies, 
some of the stones are as long 
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as 4.5 metres, the Stonehenge 
Hidden Landscapes Project 
announced on 7 September. 
Its construction could date to 
about 3,000 Bc, the same as 
Stonehenge. The stones seem 
to have been purposely buried 
under the earthworks of the 
existing Durrington Walls 
mega-henge. The remnants 
have not yet been excavated but 
the team hopes that they will 
improve understanding of the 
Neolithic period. 


Forest loss halved 


The world’s forest area declined 
at a rate of 3.3 million hectares 
per year between 2010 and 
2015, according to the latest 
global assessment from 

the Food and Agriculture 
Organization (FAO) of the 
United Nations. However, this 
is well below the 7.3 million 
hectares lost per year in the 
1990s (R. J. Keenan et al. Forest 
Ecol. Mgmt 352, 9-20; 2015). 
FAO director-general José 
Graziano da Silva called the 
slow-down in deforestation 

an “encouraging tendency” 
when he launched the report 
in Durban, South Africa, on 

7 September, but stressed that 
more still needs to be done. 


AWARDS 


Lasker awards 

The 70th Albert and Mary 
Lasker Foundation awards were 
announced on 8 September. 
Evelyn Witkin of Rutgers 


TREND WATCH 


Two Europe-wide surveys done 


8 years apart of people at risk 
of heart disease show that the 


proportion who smoke remains 


the same at 17%. But of those, 
the proportion who do not 


intend to stop smoking has risen 


sharply from 23% to 34%. The 
EUROASPIRE surveys were 
run by the European Society 
of Cardiology in 2006-07 and 
2014-15. In total, 5,890 people 
in Bulgaria, Croatia, Poland, 
Romania and the United 
Kingdom were surveyed, and 


3,827 participated in both surveys. 


University in New Brunswick, 
New Jersey, and Stephen 
Elledge of Brigham and 
Women’s Hospital in Boston, 
Massachusetts, shared the 
award in basic medical research 
for their studies into how cells 
respond to and correct DNA 
damage. James Allison of 

the University of Texas MD 
Anderson Cancer Center in 
Dallas won the clinical medical 
research award for work on 
cancer immunotherapies. 

The public-service award 

went to the humanitarian 
organization Médecins 

Sans Frontiéres for its work 

in battling the 2014 Ebola 
epidemic in Africa (pictured). 


NO INTENTION TO QUIT 


EVENTS 


Iran deal secured 
US President Barack 

Obama has secured enough 
Democratic-party votes 
from the US Senate for July’s 
multilateral deal on Iran’s 
nuclear programme to survive 
Republican opposition. 
Republicans are opposed to 
the deal, and are planning 

a resolution of disapproval. 
To stop the resolution 
completely, the Obama 
administration needs 41 out 
of 100 votes in the Senate. 
But ifhe gets at least 34 votes 
in the deal’s favour, Obama 
can veto the disapproval 
resolution. As of 7 September, 
38 Democratic senators 
supported the deal. 


Polio comeback 


Two children have become 
paralysed in Europe's first 
polio cases in five years, the 
World Health Organization 
reported on 1 September. 

The cases — ina 10-month- 
old and a 4-year-old near 
Ukraine's southwestern 
borders with Romania, 
Slovakia, Hungary and 
Poland — were caused by 
viruses that are mutated 
relatives of those in the live 
polio vaccine. Such vaccine- 
derived strains are the result of 
low immunization coverage, 
but are considered easier to 
control than outbreaks of wild 
polio virus. Last year, just half 


Since the last survey in 2006-07, the number of people at highest risk 
of heart disease who smoke has stayed the same, but the number of 
those who do not intend to stop has increased. 
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SEVEN DAYS | THIS WEEK | 


14-16 SEPTEMBER 
Guidelines for using 
satellite observations 
to reduce disaster risk 
are under discussion 
at the United Nations 
International 
Conference on Space- 
based Technologies for 
Disaster Management in 
Beijing. 
go.nature.com/nw3k5r 


14-25 SEPTEMBER 
The theory of a world 
with many dimensions 
will be debated at length 
at the Stringy Geometry 
meeting at the Mainz 
Institute for Theoretical 
Physics, Germany. 
go.nature.com/ahdgrg 


15-19 SEPTEMBER 
Experts in cell death 
gather at Cold Spring 
Harbor, New York. 
go.nature.com/Ihszuw 


of Ukrainian children received 
full immunizations against 
polio and other preventable 
diseases. 


Radar retired 

One of the two science 
instruments aboard NASA’s 
US$916-million Soil Moisture 
Active Passive (SMAP) 
satellite was declared dead 

on 2 September. SMAP 

was launched in January to 
produce frequent global maps 
of soil moisture. But its radar 
instrument, which measures 
energy reflecting off Earth’s 
surface, stopped transmitting 
on 7 July. The problem seems 
to be with the radar’s power- 
boosting amplifier. SMAP’s 
other science instrument, a 
radiometer, still works, but 
losing the radar means that the 
soil-moisture maps (including 
freezing and thawing cycles) 
will be of coarser resolution 
than planned. 


> NATURE.COM 
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STRUCTURAL BIOLOGY Cryo-EM 
reveals a raft of dazzling 
protein structures p.172 


Amyloid-B protein (brown) has been found in the pituitary gland, which sits just outside the brain. 


Alzheimer’s fear in 
hormone patients 


Brain plaques may have been seeded by growth therapy. 


BY ALISON ABBOTT 


nly a decade ago, the idea that 
() Alzheimer’s disease might be transmis- 

sible between people would have been 
laughed off the stage. But scientists have since 
shown that tissues can transmit symptoms of 
the disease between animals — and new results 
imply that humans, at least in one unusual 
circumstance, may not be an exception. 

The findings, published in this issue of 
Nature, emerged during autopsy studies of the 
brains of eight people who had died of the rare 
but deadly Creutzfeldt—Jakob disease (CJD; 
Z. Jaunmuktane et al. Nature 525, 247-250; 
2015). They contracted it decades after treat- 
ment with contaminated batches of growth 


hormone that had been extracted from the 
pituitary glands of human cadavers. Six of the 
brains, in addition to the damage caused by 
CJD, harboured the tell-tale amyloid pathol- 
ogy that is associated with Alzheimer’s disease. 

“This is the first evidence of real-world 
transmission of amyloid pathology,’ says 
molecular neuroscientist John Hardy of Uni- 
versity College London (UCL). “It is poten- 
tially concerning” 

If confirmed, the findings raise the spec- 
tre that tens of thousands of other people 
treated with the human growth-hormone 
(hGH) extracts might be at risk of Alzhei- 
mer’s. And although there is no suggestion 
that Alzheimer’s could be contracted through 
normal contact with patients, some scientists 


worry that the findings may have broader 
implications: that Alzheimer’s could be passed 
on by other routes through which CJD can be 
transmitted, such as blood transfusions or 
contaminated surgical instruments. 

CJD is one of several neurodegenerative 
diseases caused by an infectious, misfolded 
protein, or prion, called PrP. Its misfolded shape 
makes it sticky, so it forms clumps. Scientists 
now believe that Alzheimer’s could also be trig- 
gered bya similar misfolding, in this case of the 
peptide amyloid-f, with the disease’s plaques 
growing from small amyloid-f ‘seeds. Mice and 
marmosets have developed plaques when their 
brains were injected with brain extracts con- 
taining amyloid-f; in mice, plaques developed 
even when the extracts were injected into the 
animals’ bellies. 

The authors of the latest paper provide the 
first support for the theory that amyloid-plaque 
formation could be triggered in this way in 
humans, although “they fall short of provid- 
ing the final proof of this’, says neuroscientist 
Mathias Jucker of the University of Tubingen, 
Germany, who is co-author of an accompanying 
News & Views article (see page 193). Such proof 
would require injecting the cadaver-derived 
hGH into animals under controlled conditions 
and seeing whether amyloid deposits develop 
as a consequence. 

But it may not be easy to get hold of the 
original hGH extracts, which were prepared 
in various locations. Some are known to have 
been stored in Britain, where court cases about 
possible liability are ongoing, but scientists do 
not know whether other stocks have been kept. 
People who received the hGH injections will 
also be difficult to trace after so many years. 
The National Prion Clinic at UCL Hospital, 
which has a helpline for people who are con- 
cerned about the risk of CJD after hGH injec- 
tions, will advise those who call asking about 
the new developments. 

From 1958 until 1985, when the dangers were 
first realized, around 30,000 people worldwide 
had hGH injected into their muscles — mostly 
children who had not been growing at a normal 
rate. The preparations comprised pooled mate- 
rial extracted from thousands of cadavers. Some 
extracts turned out to have been contaminated 
with CJD prions, leading to 226 deadly infec- 
tions by 2012, mostly in France (119 cases), Brit- 
ain (65 cases) and the United States (29 cases). 
Numbers are still creeping up, because CJD has 
a long incubation period. > 
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> None of the eight patients studied, 
who were aged between 36 and 51 when 
they died, had shown clinical symptoms of 
Alzheimer’s disease, which also has a long 
incubation period. Of the six who already 
had amyloid-f pathology, it was widespread 
in four. 

Because itis rare to see this type of amyloid 
pathology at such young ages, the scientists 
suspected that amyloid seeds may have been 
transferred with the hGH injection, just as 
the CJD prion had been. They did a series of 
investigations to rule out other explanations. 

They determined that none of the eight 
individuals carried genes that would pre- 
dispose them to early-onset Alzheimer’s 
or other neurodegenerative diseases. They 
looked for, but did not find, significant 
amyloid pathology in patients of a similar 
age who had died of CJD or other prion dis- 
eases but had never been treated with hGH. 

Furthermore, the team checked to see 
whether amyloid pathology really can spread 
from the brain to the pituitary gland, located 
just outside the base of the brain. Confirm- 
ing a 2013 US study, they found that it can 
spread in principle. They examined the pitu- 
itary glands of 49 people who had died with 
amyloid plaques in their brains and found 
that seven contained amyloid deposits. 

“We think that the most plausible expla- 
nation for the occurrence of the amyloid 
pathology is that it had been transmitted by 
particular hGH extracts that happened to 
be contaminated with amyloid-f seeds as 
well as the CJD prions,” says John Collinge, 
a co-author of the paper and a neurolo- 
gist at UCL. If this turns out to be the case, 
amyloid-B would have been a much more 
frequent contaminant in the different hGH 
batches than PrP was, because Alzheimer’s 
is a very common disease. 

Prions are harder to deactivate than 
bacteria and viruses. They stick tightly 
to metals, and decontamination requires 
extreme sterilization conditions, which can 
harm fragile medical instruments. For these 
reasons, neurosurgeons do not routinely do 
this type of decontamination, says one Ger- 
man neurosurgeon, speaking off the record 
— adding that ifit were to be confirmed that 
Alzheimer’s is transmitted in a prion-like 
way, the impact on public health and surgical 
practice would be major, and very expensive. 

“We have learnt alot about decontamina- 
tion from our experience with CJD, says 
neuropathologist Charles Duyckaerts at the 
Pitié-Salpétriére Hospital in Paris. “But this 
is a wake-up call to the medical community 
to be particularly vigilant” 

With so much at stake, scientists are 
preparing to try to replicate the results inde- 
pendently. Duyckaerts says that he plans 
to do so on 20 or 30 subjects who died of 
CJD in France after receiving the cadaver- 
derived hGH treatment. m 
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BIOMEDICAL SCIENCE 


US agencies plan 
ethics overhaul 


Government proposes long-awaited revision to regulations 
designed to protect human subjects. 


BY HEIDI LEDFORD 


fter years of uncertainty, the US 
Avene has revived an effort 

to update regulations that govern 
research involving human subjects. The 
changes would be the most significant since 
the rules were introduced in 1991. 

On 2 September, the US Department of 
Health and Human Services (HHS) announced 
a proposal to address concerns that have 
emerged since the regulations — known col- 
lectively as the Common Rule — took effect. 
These issues include delays caused by over- 
lapping ethics reviews of studies conducted 
at multiple sites, and the rise of genomic 
technologies that can identify the donors of 
anonymized samples. 


The HHS will “There’sa 
begin a 90-day pub- huge public 
lic-comment period benefit from 
onthe proposal next theresearch 
week and will decide done with 
how to proceed once de-identified 
that has ended, says samples. it 


Kathy Hudson, dep- 

uty director for science, outreach and policy 
at the US National Institutes of Health (NIH) 
in Bethesda, Maryland. 

The HHS solicited public comments on a 
similar proposal in July 2011. As the years ticked 
by without further word on the fate of the revi- 
sions, observers grew concerned. “I was totally 
worried,’ says Ezekiel Emanuel, a bioethicist at 
the University of Pennsylvania in Philadelphia, 
who helped to launch the effort. “It was stalled.” 

Hudson attributes the delay in part to the 
need to achieve consensus between the 18 gov- 
ernmental departments and agencies that 
follow the Common Rule. Research has changed 
dramatically since the policy was established. 
Clinical trials are now frequently conducted 
at multiple sites, with research protocols often 
reviewed by ethics committees at each place. 
Asa result, it can take a year or more to gain 
approval for a large, multi-centre trial. 

The proposed revisions would authorize a 
single ethics review for such studies. The NIH 
plans to enact a similar provision later this year, 
notes Hudson, but modifying the Common 
Rule would extend this to other agencies. 

The update also suggests simplifying reviews 
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of research deemed to be of minimal risk to 
participants. This would reduce the burden on, 
among others, social scientists who are conduct- 
ing surveys or collecting oral histories. Emanuel 
says that this would better protect participants 
by allowing overtaxed ethics committees to 
focus their attention on higher-risk research. 

Another major provision would require a 
person's consent to the storing of samples for 
unspecified future research. At present, such 
consent is required only when a subject’s name 
or other identifying information is associated 
with the material. Stripping those data frees the 
sample for distribution to researchers without 
consent. 

But the rise of genomic sequencing has called 
into question whether such samples can ever 
be truly anonymized. Researchers have been 
able to trace the identities of some subjects 
on the basis of their DNA sequences. “The 
people who are participating in research and 
providing pieces of themselves should be pro- 
viding permission as well,” says Hudson. 

That change could put a damper on some 
research, notes Barbara Koenig, a medical 
anthropologist at the University of California, 
San Francisco. “There's a huge public benefit 
from the research done with de-identified 
samples,’ she says. “Requiring explicit consent 
is going to throw a wrench in that.” 

Dora Hughes, a senior policy adviser at the 
law firm Sidley Austin in Washington DC, 
says that the stricter requirements could also 
affect the pharmaceutical and medical-device 
industries. But she commends the HHS for 
not applying those requirements retroactively 
to existing samples — a possibility that the 
department once considered, she says (Hughes 
is a former HHS counsel). “That discussion 
raised the spectre of millions of samples that 
could not be used for research and would 
otherwise go to waste,” she says. 

It is not clear how long the HHS will take 
to finalize the changes, but Hudson says that 
it is unlikely to wait another four years. She 
adds that the revision would play an important 
part in facilitating the planned US Precision 
Medicine Initiative, a massive government 
effort to collect genetic, physiological and 
other health data from 1 million volunteers. 
“This is really important,’ Hudson says. “We 
can't dilly-dally.” = 
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Today’s most widely used encryption methods will not be strong enough resist quantum computers. 


INFORMATION SECURITY 


Encryption faces 
quantum foe 


Researchers urge readiness against attacks from 


future-generation computers. 


BY CHRIS CESARE 


dread: the arrival of powerful quantum 
computers that can break the security of the 
Internet. Although these devices are thought 
to be a decade or more away, researchers are 
adamant that preparations must begin now. 
Computer-security specialists are meeting 
in Germany this week to discuss quantum- 
resistant replacements for today’s crypto- 
graphic systems — the protocols used to 
scramble and protect private information 
as it traverses the web and other digital net- 
works. Although today’s hackers can, and 
often do, steal private information by guessing 
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I: is an inevitability that cryptographers 
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passwords, impersonating authorized users or 
installing malicious software on computer net- 
works, existing computers are unable to crack 
standard forms of encryption used to send 
sensitive data over the Internet. 

But on the day that the first large quantum 
computer comes online, some widespread and 
crucial encryption methods will be rendered 
obsolete. Quantum computers exploit laws that 
govern subatomic particles, so they could easily 
defeat existing encryption methods. 

“Tm genuinely worried we’re not going 
to be ready in time,” says Michele Mosca, 
co-founder of the Institute for Quantum Com- 
puting (IQC) at the University of Waterloo in 
Canada and chief executive of evolutionQ, a 
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cybersecurity consulting company. 

It will take years for governments and indus- 
try to settle on quantum-safe replacements for 
today’s encryption methods. Any proposed 
replacement — even if it seems impregnable at 
first — must withstand multitudes of real and 
theoretical challenges before it is considered 
reliable enough to protect the transfer of intel- 
lectual property, financial data and state secrets. 

“To trust a cryptosystem, you need a lot of 
people to scrutinize it and try to devise attacks 
on it and see if it has any flaws,’ says Stephen 
Jordan, a physicist at the US National Institute 
of Standards and Technology (NIST) in Gaith- 
ersburg, Maryland. “That takes along time” 

This week’s workshop, held at the Schloss 
Dagstuhl-Leibniz Center for Informatics in 
Wadern, is one of several this year bringing 
together cryptographers, physicists and math- 
ematicians to evaluate and develop crypto- 
graphic tools that are less vulnerable to quantum 
computers. NIST hosted its own workshop in 
April, and the IQC will team up with the Euro- 
pean Telecommunications Standards Institute 
for another, in early October in Seoul. 

Intelligence agencies have also taken notice. 
On 11 August, the US National Security 
Agency (NSA) revealed its intention to tran- 
sition to quantum-resistant protocols when it 
released security recommendations to its ven- 
dors and clients. And in a memo posted on its 
website earlier this year, the Dutch General 
Intelligence and Security Service singled out 
a looming threat that adds even more urgency 
to the need for quantum-safe encryption. Ina 
scenario it calls ‘intercept now, decrypt later, a 
nefarious attacker could start intercepting and 
storing financial transactions, personal e-mails 
and other sensitive encrypted traffic and then 
unscramble it all once a quantum computer 
becomes available. “I wouldn't be at all sur- 
prised if people are doing that,’ says Jordan. 

As far back as 1994, mathematician Peter 
Shor showed that a quantum computer would 
be able to quickly foil ‘RSA encryption; one of 
the major safeguards used today (P. W. Shor 
Preprint available at http://arxiv.org/abs/quant- 
ph/9508027v2; 1995). At the time, it was not 
clear whether such a machine would ever be 
built, says Mosca, because researchers assumed 
that it would need to operate flawlessly. But a 
theoretical discovery in 1996 showed that up to 
a limit, a quantum computer with some flaws 
could be just as effectiveasaperfectone. > 
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> Published experiments with small 
quantum devices are starting to approach this 
faultiness threshold, notes Mosca. And because 
secretive organizations such as the NSA are 
keenly interested in the technology, it is widely 
assumed that these published results do not rep- 
resent the cutting edge of research. “We have 
to assume there's going to be people that are a 
few years ahead of what’s available in the public 
literature,’ says Mosca. “You can’t wait for the 
headlines in The New York Times to have your 
plan in place” 

The safety of today’s Internet traffic relies 
in part on a type of encryption called public- 
key cryptography — which includes RSA — to 
establish secret communication between users. 
A sender uses a freely available digital key to 
lock a message, which can be unlocked only 
with a secret key held by the recipient. The 
security of RSA depends on the difficulty of 
breaking up a large number into its prime fac- 
tors, which serve as its secret key. In general, 
the larger the number, the harder this problem 
is to solve. 

Researchers believe that it takes existing 
computers a long time to factorize big numbers, 
partly because no one has yet discovered how 


to do it quickly. But quantum computers could 
factorize a large number exponentially faster 
than any conventional computer, and this nul- 
lifies RSA’ reliance on factoring being difficult. 

Several options already exist for new public- 
key cryptosystems. These replace the factor- 
ing problem with other difficult mathematics 
problems that are not expected to yield to 
quantum computers. Although these systems 
are not perfectly safe, researchers think that 
they are secure enough to protect secrets from 
quantum computers for all practical purposes. 

One such system is lattice-based cryptog- 
raphy, in which the public key is a grid-like 
collection of points in a high-dimensional 
mathematical space. One way to send a secret 
message is to hide it some distance from a 
point in the lattice. Working out how far the 
encrypted message is to a lattice point is a dif- 
ficult problem for any computer, conventional 
or quantum. But the secret key provides a simple 
way to determine how close the encrypted mes- 
sage is to a lattice point. 

A second option, known as McEliece encryp- 
tion, hides a message by first representing it as 
the solution to a simple linear algebra problem. 
The public key transforms the simple problem 


into one that seems much more difficult. But 
only someone who knows how to undo this 
transformation — that is, who has the private 
key — can read the secret message. 

One drawback of these replacements is that 
they require up to 1,000 times more memory 
to store public keys than existing methods, 
although some lattice-based systems have keys 
not much bigger than those used by RSA. But 
both methods encrypt and decrypt data faster 
than today’s systems, because they rely on sim- 
ple multiplication and addition, whereas RSA 
uses more-complex arithmetic. 

PQCRYPTO, a European consortium of 
quantum-cryptography researchers in aca- 
demia and industry, released a preliminary 
report on 7 September recommending crypto- 
graphic techniques that are resistant to quan- 
tum computers (see go.nature.com/5kellc). 
It favoured the McEliece system, which has 
resisted attacks since 1978, for public-key cryp- 
tography. Tanja Lange, head of the €3.9-million 
(US$4.3-million) project, favours the safest 
possible choices for early adopters. “Sizes and 
speed will improve during the project,’ she 
says, “but anybody switching over now will 
get the best security.” = 


Germany claims success for 
elite universities drive 


Report praises €4.6-billion scheme to make leading universities more competitive — but 
some smaller institutions have done just as well. 


BY QUIRIN SCHIERMEIER & RICHARD VAN 
NOORDEN 


been trying to explode the myth that 

all the country’s universities are equal. 
In 2006, it launched an 11-year, €4.6-billion 
(US$5-billion) programme that aimed to make 
the best German universities more competi- 
tive with the likes of Oxford, Cambridge and 
Harvard. The campaign, called the Excellence 
Initiative, led to 14 institutions gaining the 
unofficial label of ‘lite’ 

A 3 September report by Germany's main 
research-funding agency, the DFG — which 
administers the initiative together with Ger- 
many’s science council — suggests that the cash 
influx is paying off. Still, a German equivalent 
of the US Ivy League may be slow to form. An 
analysis by Nature’s news team shows that some 
universities less favoured by the initiative have 
improved just as quickly as the elites when 


| te a decade, Germany's government has 


it comes to generating highly cited work. “It 
doesn't require the ‘elite’ label to produce good 
research in Germany, says Alfred Forchel, pres- 
ident of the University of Wurzburg, an institu- 
tion that has kept pace without top-up funds. 

The DFG sees this as positive. “The Excel- 
lence Initiative has met expectations,’ says Dor- 
othee Dzwonnek, DFG secretary-general. “And 
it has not weakened universities which don't 
directly benefit from it” But some critics say that 
the scheme has benefited administrators more 
than scientists. And a huge increase in research 
funding across Germany over the past decade 
makes it difficult to tease out the influence of the 
initiative on the country’s improvement. 

The DFG report, an analysis of funding in 
German universities that is released every three 
years, marks the first attempt to measure pre- 
liminary outcomes of the initiative. In 2011-13 
alone, 45 universities received a total of more 
than €1 billion for running international grad- 
uate schools and setting up specific clusters of 
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excellence. A subset also each received an extra 
€10 million to €14 million a year for ‘institu- 
tional strategies’ to strengthen the university 
as a whole — the most prestigious part of the 
competition (see Nature 487, 519-521; 2012). 

The elite group includes some of Germany's 
largest and best-equipped research universi- 
ties, such as the Ludwig Maximilian University 
of Munich and RWTH Aachen University. The 
report shows that the elites dominate when it 
comes to winning competitive grants from the 
DFG. Asa group, they secured more than 40% 
of the agency’s total funding from 2011-13. 
However, the same 14 institutions won almost 
the same share of DFG funding in 2002-04, 
before the initiative had launched. 

Scientific output is booming at the 45 uni- 
versities that got cash out of the Excellence 
Initiative, the DFG report points out. They have 
boosted their output by 43% in chemistry and 
physics since 2002, more than the 34% increase 
in these subjects by all German universities. 


SOURCE: SCOPUS/SCIVAL 


GERMANY RISING 


Germany’s research articles are rapidly becoming more highly cited. But some of the country’s smaller 
universities are matching the rise of its ‘elite’ institutions. 


Proportion of articles in world’s 
top 10% (by citations) 


2004 


*Universities of Bonn, Ulm, Leipzig, Regensburg, Wurzburg. Analysis of articles in Scopus database using Elsevier's Scival tool. 


2006 


Anda further analysis by Nature finds that the 
14 elites alone now produce 35% of Germany's 
total articles, up from a share of 29% in 2002. 


THE REST CHASE THE BEST 

But the Excellence Initiative may not be sepa- 
rating the elites from the rest when it comes to 
the quality of research papers. Nature’s analysis 
shows that almost one-quarter of articles from 
the elites are now in the world’s top 10% by cita- 
tions — up from one-sixth 12 years ago. Yet it 
also shows that some other German universities 


2008 


a Harvard University 


14 elite German 
universities 


— 5 other German 
universities* 

‘LU Germany 
United States 


2010 


that received much less funding, or no top-up 
funds, have matched this rise (see ‘Germany 
rising’). That is enlightening, says Karl Ebeling, 
president of the University of Ulm, which had 
little success in the initiative but is higher in 
some international rankings than elite univer- 
sities in Bremen and Konstanz, for instance. 
Bjorn Brembs, a neurobiologist at the Uni- 
versity of Regensburg, thinks that the unclear 
impact of the initiative on creating elites is 
because the cash was poorly spent. He has 
delved into German employment statistics and 
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found evidence, he says, of booming academic 
bureaucracy. “For every scientist who has been 
recruited thanks to the Excellence Initiative, 
four new administrative positions were cre- 
ated,” he says. “It is hardly surprising that elite 
institutions have no research advantage over 
some other universities when the group that 
benefits most from the Excellence Initiative is 
not involved in science.” 

The DFG says that it has not looked at how 
the programme may have burdened university 
administrations. “It has attracted 4,000 tal- 
ented foreign scientists to German universities 
and it has greatly increased these universities’ 
scholarly output,” says Dzwonnek. “From our 
point of view, this is a real success.” Many agree 
that the competition, despite ambiguity over its 
measurable impacts, has served German science 
well. It was a positive shock to Germany's struc- 
turally conservative science system, says Jakob 
Edler, executive director of the Manchester 
Institute of Innovation Research, UK. 

The results of a comprehensive evaluation 
of the Excellence Initiative by an international 
panel of experts are due in January 2016. The 
federal government and Germany’s 16 states, 
which have tentatively agreed to continue the 
programme, will then decide about its future. 
“The Excellence Initiative promotes fresh ideas 
and new collaborations. I do hope it continues 
beyond 2017,’ Forchel says. m 
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TRILLIONS 
UF TREES 


SURVEY OF SURVEYS FINDS 422 TREES 
FOR EVERY PERSON ON EARTH 


hy, 


Three trillion: the latest estimate of the planet’s 
tree population, published in this issue of 
Nature (see page 201), exceeds the number of 
stars in the Milky Way. At more than 7 times the 
previous estimate of 400 billion, the figure is 
impressive, but it should not necessarily be 
taken as good news. The forest-density study — 
which combined satellite imagery with data 
from tree counts on the ground that covered 
more than 4,000 square kilometres — also 
estimated that 15 billion trees are cut down 
each year. And in the 12,000 years since 
farming began spreading across the globe, the 
number of trees on our planet has fallen by 
almost half. 
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For an animated data 
visualization, visit: 
go.nature.com/h8ucmu A = 10 billion trees 

Line height represents 

forest density in 1 km? 
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SOURCE: T. W. CROWTHER ET AL. NATURE 525, 201-205; 2015 


Forest density 
> 1 million 


trees per km? 
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FOREST PLANET 


Climate and human activity determine 
the distribution of trees around the 
world. Europe, India and eastern China 
have lost much of their original forest 
cover, and Africa’s woodlands are 
severely fragmented. 


1. NORTHEASTERN NORTH AMERICA 

Farms, orchards and sheep took over the landscape of 
northeastern North America in the 1800s, when much 
the region's forest was harvested for timber. Today, the 
six US states of New England are more than 80% 
forested — but suburban sprawl and other factors 
present new threats. 


+) New England 


Atlantic 
Ocean 


LAY OF THE LAND 


Despite deforestation caused by farming, ranching, mining and logging, tropical 
areas still contain an astounding 43% of the planet’s trees. Tree densities are 

greatest in the northern boreal and tundra forests, which can contain more than 
1,000 trees per hectare. (Percentages are rounded.) 
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2. HISPANIOLA 

The effects of deforestation are stark on the 
Caribbean island of Hispaniola. The Dominican 
Republic, on the eastern side of the island, has tree 
cover that is four times denser than that in 
neighbouring Haiti, which has been forced to cut 
down trees for fuel. 
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3. SOUTHEAST ASIA 

Forests in southeast Asia have changed drastically 
since the 1970s. From 1973 to 2009, Thailand and 
Vietnam lost 43% of their forest cover; Cambodia and 
Laos lost 22% and 24%, respectively. If current trends 
continue, more than 30% of the region’s remaining 
forest will be cleared by 2030. 
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LEAF OF NATIONS 


The tropics host many densely forested countries, but 
nations with boreal forest, such as Finland, have the 

highest tree densities. At the other extreme are desert 
and island nations, and some impoverished countries. 
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THE REVOLUTION WILL NOT Bt 
GAYS TALLIZED 


BY EWEN CALLAWAY 
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na basement room, deep in the bowels of a steel-clad building in 
Cambridge, a major insurgency is under way. 

A hulking metal box, some three metres tall, is quietly beaming 
terabytes’ worth of data through thick orange cables that disappear 
off through the ceiling. It is one of the world’s most advanced cryo- 

electron microscopes: a device that uses electron beams to photograph 
frozen biological molecules and lay bare their molecular shapes. The 
microscope is so sensitive that a shout can ruin an experiment, says 
Sjors Scheres, a structural biologist at the UK Medical Research Council 
Laboratory of Molecular Biology (LMB), as he stands dwarfed beside the 
£5-million (US$7.7-million) piece of equipment. “The UK needs many 
more of these, because there’s going to be a boom,’ he predicts. 

In labs around the world, cryo-electron microscopes such as this 
one are sending tremors through the field of structural biology. In the 
past three years, they have revealed exquisite details of protein-making 
ribosomes, quivering membrane proteins and other key cell molecules, 
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discoveries that leading journals are publishing at a rapid clip. Structural 
biologists say — without hyperbole — that their field is in the midst of a 
revolution: cryo-electron microscopy (cryo-EM) can quickly create high- 
resolution models of molecules that have resisted X-ray crystallography 
and other approaches, and labs that won Nobel prizes on the back of ear- 
lier techniques are racing to learn this upstart method. The new models 
reveal precisely how the essential machinery of the cell operates and how 
molecules involved in disease might be targeted with drugs. 

“There's a huge range of very important biological problems that are 
now open to being tackled in a way that they could never before,’ says 
David Agard, a structural cell biologist at the University of California, 
San Francisco. 

Scheres was recruited to the LMB several years ago to help push 
cryo-EM technology to its limits — and he and his colleagues have done 
just that. Last month, they reported one of the burgeoning field’s most 
impressive feats: a startlingly clear picture of an enzyme implicated in 
Alzheimer’s disease, showing the position of its 1,200 or so amino acids 
down to a resolution ofa few tenths of a nanometre’. 

Biologists are now pushing the technique further to deduce ever more 
detailed structures of small and shape-shifting molecules — a challenge 
even for cryo-EM. “Whether you call it revolution or a quantum leap, the 
fact is that the gates have opened,” says Eva Nogales, a structural biologist 
at the University of California, Berkeley. 


CRYSTAL COAXING 

Spend a bit of time with a structural biologist and they will probably 
mention their field’s unofficial motto: ‘structure is function. Only by 
knowing the atom-by-atom arrangement of a biomolecule can research- 
ers grasp how it works — how, for instance, the ribosome reads strands 
of messenger RNA to manufacture proteins, or how molecular pores flip 
open and shut. For decades, one technique enjoyed a near monopoly 
in elucidating protein structures to this level of detail: X-ray crystallo- 
graphy, in which scientists persuade proteins to form into crystals, then 
blast X-rays at them and decipher the protein's structure from patterns 
that the X-rays make when they bounce off (see ‘Structure solvers’). Of 
the more than 100,000 entries in the Protein Data Bank, a popular reposi- 
tory of protein structures, about 90% were solved by this technique. It has 
contributed to more than a dozen Nobel prizes, including the one awarded 
in 1962 for revealing DNAs double helix. 

But although X-ray crystallography has been structural biologists’ best 
tool, italso has major limitations. It can take researchers years to find ways 
of forming some recalcitrant proteins into large crystals that are suitable 
for analysis, and many fundamentally important molecules — such as 
proteins that are embedded in cell membranes or that make up complex 
molecular machines — have defied crystallization. 

X-ray crystallography was certainly king when biologist Richard 
Henderson arrived at the LMB in 1973 to study a protein called bacterio- 
rhodopsin, which uses light energy to pump protons across a membrane. 
Henderson and his colleague Nigel Unwin had managed to make two- 
dimensional crystals from the protein, but they were unsuitable for X-ray 
diffraction. So the pair decided to try electron microscopy instead. 

At the time, electron microscopy was used to study viruses or slices of 
tissue that had been treated with heavy-metal stains. A beam of electrons 
is fired at a sample, and the emerging electrons are detected and used to 
map out the structure of the materials they smashed into. This approach 
produced the first detailed image of a virus — a tobacco pathogen — but 
the stain made it difficult to see individual proteins, let alone the atomic 
details that the X-rays were revealing. “It was blobby stuff or negative- 
stained, and you would see outlines of molecules,’ says Agard. 

Ina pivotal step, Henderson and Unwin omitted the stain when they 
used electron microscopy to image crystal sheets 


of bacteriorhodopsin — instead, they placed the SD NATURE.COM 

crystals on metallic grids to make the protein stand See aselection of 
out. “You were looking at the atoms in the protein, —_ stunning cryo-EM 
says Henderson, who, with Unwin, published” the — structures at: 


structure of bacteriorhodopsin in 1975. “That was _ go.nlature.com/cehow8 


such a huge step forward,” Agard says. “That said, ‘OK, it will be possible 
to solve protein structures by EM’? 

The cryo-EM field developed through the 1980s and 1990s; a key 
advance was the use of liquid ethane to flash-freeze proteins in solu- 
tion and hold them still’, which is how the ‘cryo’ came to cryo-EM. But 
still the technique could generally resolve structures only to more than 
10 Angstréms (1 A is one-tenth of a nanometre) — nothing to rival the 


better than 4-A models of X-ray crys- 

“WE S H 0 U LD G 0 FO R tallography, and nowhere near what 
was needed to use the structures for 

drug design. While funders such as 

G LOBAL D 0 M | N AT 0 N the US National Institutes of Health 
z were ploughing hundreds of millions 

0 F CRYO EM OVE R of dollars into ambitious crystallogra- 
ALL TH E ST RU CTU RAL phy initiatives, support for cryo-EM 

y 

M ETH 0 DS. the annual Gordon Research Confer- 
ence on 3D electron microscopy, a 

colleague opened the meeting with a provocative statement: cryo-EM was 
a “niche” method, he said, unlikely to ever supplant X-ray crystallography. 
But Henderson could see a different future, and he fired back a salvo in 


lagged far behind. 
In 1997, when Henderson attended 
the next talk. “I said we should go for global domination of cryo-EM over 
all the structural methods,’ he recalls. 


THE REVOLUTION STARTS HERE 

In the years that followed, Henderson, Agard and other cryo-EM 
evangelists worked methodically on technical improvements to electron 
microscopes — in particular, on better ways to sense electrons. Long after 
digital cameras had taken the world by storm, many electron microsco- 
pists still preferred old-fashioned film because it recorded electrons more 
efficiently than did digital sensors. But, working with microscope manu- 
facturers, the researchers developed a new generation of ‘direct electron 
detectors’ that vastly outperforms both film and digital-camera detectors. 

Available since about 2012, the detectors can capture quick-fire images 
of an individual molecule at dozens of frames per second. Researchers 
suchas Scheres, meanwhile, have written sophisticated software programs 
to morph thousands of 2D images into sharp 3D models that, in many 
cases, match the quality of those deciphered with crystallography. 

Cryo-EM is suited to large, stable molecules that can withstand 
electron bombardment without jiggling around — so molecular 
machines, often built from dozens of proteins, are good targets. None 
has proved more suitable than ribosomes, which are braced by rigid 
twists of RNA. The solution of ribosome structures by X-ray crystal- 
lography won three chemists the 2009 Nobel Prize in Chemistry — but 
those efforts took decades. In the past couple of years, ‘ribosomania has 
gripped cryo-EM researchers, and various teams have quickly deter- 
mined and published dozens of cryo-EM structures of ribosomes from 
a multitude of organisms, including the first high-resolution models of 
human ribosomes*”. X-ray crystallography has largely fallen by the way- 
side in the LMB laboratory of Venki Ramakrishnan, who shared the 2009 
Nobel. For large molecules, “it's safe to predict that cryo-EM will largely 
supersede crystallography’, he says. 

The rocketing number of cryo-EM publications suggests this to be true: 
in 2015 alone, the technique has so far been used to map the structures of 
more than 100 molecules. And, unlike X-ray crystallography, in which 
crystals lock proteins ina single, static pose, researchers can use cryo-EM 
to calculate the structure of a protein that has been flash-frozen in several 
conformations and so deduce the mechanisms by which it works. 

In May, structural biologist John Rubinstein at the University of 
Toronto, Canada, and his colleagues used around 100,000 cryo-EM 
images to create a ‘molecular movie’ of a rotor-shaped enzyme called 
V-ATPase, which pumps protons in and out of cell vacuoles by burning 
ATP®. “What we saw is that everything is flexible,” Rubinstein says. “It’s 
bending and twisting and deforming.” He thinks that the enzymes flex- 
ibility helps it to efficiently transmit energy released by ATP to the pump. 
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X-ray crystallography has long 
been the dominant method for 
deducing high-resolution protein 
structures, but cryo-electron 
microscopy is catching up. 


—_—S 


X-RAY CRYSTALLOGRAPHY 

X-rays scatter as they pass through a 
crystallized protein; the resulting waves 
interfere with each other, creating a 
diffraction pattern from which the 
position of atoms is deduced. 


And when a team led by Nogales in 2013 pieced together cryo-EM 
images of a complex that orchestrates the transcription of DNA into 
RNA, they discovered that an entire arm swings 100 A around the DNA 
strand like a crane, potentially influencing whether a gene is transcribed’, 
“T think this is beautiful,” says Nogales. “It’s a true insight into how these 
biological machines work” 


SMALL AND BEAUTIFUL 

Now that cryo-EM has hit its stride, experts are looking for grander 
challenges. For many, the most coveted targets are smaller proteins 
sandwiched in cellular membranes. These tend to be linchpins in cellu- 
lar signalling pathways, as well as popular drug targets. They are also 
notoriously difficult to crystallize, and imaging individual proteins with 
cryo-EM is tough because it is harder to extract the signal from the back- 
ground noise. 

These hurdles did not stop Yifan Cheng, a biophysicist at the University 
of California, San Francisco (UCSF), from attempting cryo-EM ona 
small membrane protein called TRPV1, which detects the molecule that 
gives chilli peppers their burn and is closely related to other pain-sensing 
proteins. A team led by his collaborator David Julius, a UCSF physiolo- 
gist, had failed to crystallize the protein. The cryo-EM project was slow- 
going at first, but the same technical advances that drove ribosomania 
produced a 3.4-A structure of TRPV1 in late 2013. The report® was a 
thunderbolt to the field, because it showed that cryo-EM could conquer 
small, medically important molecules. “I literally lost an entire night's 
sleep when I saw that,’ says Rubinstein. 

More sleepless nights are likely to follow. “There’s going to be a huge 
explosion in the number of membrane-protein structures that get 
solved,” says Agard. 

One such solution was that published last month’ by Scheres, struc- 
tural biologist Yigong Shi of Tsinghua University in Beijing and their 
team. They produced a model of y-secretase — a protein that makes the 
amyloid-§ molecule that is linked to Alzheimer’s disease. The 3.4-A-res- 
olution map reveals that y-secretase mutations that cause rare inherited 
forms of Alzheimer’s map to two ‘hotspots’ in the enzyme and seem to 
influence its ability to form toxic amyloid-f particles. The structure could 
help researchers to understand why drugs that inhibit the enzyme have 
failed in past clinical trials, and help them to design new pharmaceuti- 
cals. “Stunning” is how Cheng describes the structure. 

Results such as these are attracting the attention of drug companies 
hoping to study medically important proteins that have resisted crystal- 
lography. Scheres is working with New York-based pharmaceutical giant 
Pfizer on ion channels, a broad class of membrane protein that includes 
pain-sensing molecules and neurotransmitter receptors. “I've been con- 
tacted by almost everybody,’ says Nogales of the drug companies lining 
up at her door. 

But despite the advances, many in the field see room for further 
improvement. They hope to devise better electron detectors and better 
methods for preparing protein samples. This would allow scientists to 
image proteins that are even smaller and more dynamic, and at even 
greater resolution than before. A 2.2-A structure ofa bacterial enzyme 
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A beam of electrons is fired at a 
frozen protein solution. The 
emerging scattered electrons pass 
through a lens to create a magnified 
image on the detector, and the 
structure can then be deduced. 
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published in May’ showed just how sharp cryo-EM structures can get. 

Like any burgeoning field, this one has growing pains. Some experts 
worry that researchers rushing to use the technique could produce prob- 
lematic results. A 2013 structure of an HIV surface protein’” was ques- 
tioned by scientists who said that the images used to build the model 
were white noise"! Since then, X-ray and cryo-EM models generated 
by other teams have challenged the original model, but the researchers 
have stood by their result”. This June, at the field’s Gordon conference, 
researchers wanting more quality control passed a resolution urging 
journals to provide referees with details of how cryo-EM structures 
were created. 

Costs could slow the spread of the technology. Scheres estimates that 
the LMB spends around £3,000 per day running its cryo-EM facility, plus 
another £1,000 on electricity, most of it for computers needed to store 
and process the images. “You're £4,000 per day lighter if you want to do 
this. That, for many places, is a very high cost,’ he says. To make cryo- 
EM more accessible, some funders have established shared facilities at 
which researchers can book time. The Howard Hughes Medical Institute 
(HHMI) operates a cryo-EM lab on its Janelia Farm Campus in Vir- 
ginia that is open to HHMI-funded investigators based elsewhere. In the 
United Kingdom, a national cryo-EM facility funded by the government 
and the Wellcome Trust opened this year in Didcot, near Oxford. “There 
is a real tidal wave of people wanting to learn about it,” says Helen Saibil, 
a structural biologist at Birkbeck, University of London, who helped to 
establish the UK facility. 

Riding the wave is Rod MacKinnon, a biophysicist at Rockefel- 
ler University in New York City, who shared the 2003 Nobel Prize in 
Chemistry for determining the crystal structure of certain ion chan- 
nels, but who is now deep into cryo-EM. “I’m ona steep slope of a 
learning curve, which always thrills me,’ says MacKinnon, who hopes 
to use the method to study how ion channels open and close. 

Henderson's tongue may have been firmly in his cheek when he 
declared back in 1997 that cryo-EM could rule the structural-biology 
world. But nearly 20 years later, his prediction is looking less like hyper- 
bole than it did then. “If it carries on, and all the technical problems 
are solved, cryo-EM could indeed become, not just a first choice, but a 
dominant technology,’ he says. “We are probably halfway there.” m 


Ewen Callaway writes for Nature from London. 
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Fishing for the 
first Americans 


Archaeology is moving underwater and along 
riverbanks to find clues left by the people 
who colonized the New World. 


BY EMMA MARRIS 


n 17 September, a catamaran will set off into the Pacific Ocean 
ona week-long cruise back to the Pleistocene. Laden with sonar 
instruments, the research vessel Shearwater will probe the ocean 
bottom to find places that were beaches and dry land more than 
13,000 years ago, when the sea level was around 100 metres lower. The 
researchers are hunting for evidence that ancient people lived along this 
now-sunken coastline as they colonized the New World. 
Meanwhile, other archaeologists are digging in the intertidal zone on 
a remote island off the shore of British Columbia in Canada, where the 
sea level has barely changed since the ice-age glaciers began to retreat. 
Since late last year, that team has found footprints and a tool that date 
back 13,200 years, making them some of the oldest human marks on 
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Evidence of ancient Americans has turned up 
on Calvert Island in Canada. 


the continent. Whoever left them had to have 
reached the island by boat. 

Welcome to the newest wave of American 
archaeology: the idea that the first residents of 
the Americas came by sea, hugging the Pacific 
coast as they went south. This theory marks 
a sharp departure from the once-dominant 
hypothesis that Pleistocene hunters from 
Siberia migrated by foot across a land bridge to 
Alaska and then south into the heart of North 
America. This route opened up only when 
the vast sheets of ice covering the continent 
had melted enough to permit passage. It was 
thought that these first migrants made the dis- 
tinctive stone spear tips called Clovis points, 
which began appearing at sites in the interior 
of North America around 13,000 years ago. 

There has long been evidence that others 
reached the New World at least 1,000 years 
earlier. But only in the past decade have archae- 
ologists accumulated enough evidence to 
abandon the Clovis-first model (see Nature 
485, 30-32; 2012). Some of the earliest human 
sites in the Americas date to well before a cor- 
ridor opened up between the ice sheets, which 
is forcing researchers to explore the idea that 
New World colonizers skirted the coastline. 
Travelling by boat, these early people could have 
hopscotched their way south of the ice sheets, 
subsisting on the rich marine resources of the 
ice-free strip along the shore. 

The search for these sea-going settlers will 
not be easy. Much of the evidence that archae- 
ologists seek is deep underwater — or was 
smashed long ago by the Pacific’s legendary 
waves. But momentum is building to find 
those earliest settlers. “People are just more 
optimistic,” says Quentin Mackie, an archae- 
ologist at the University of Victoria in Canada. 
Amanda Evans, a marine archaeologist at the 
ocean-survey company Tesla Offshore in 
Prairieville, Louisiana, says that prehistoric 
underwater archaeology in general is having 
a moment. “This year just seems to be the year 
that everybody was pushing the ball uphill and 
it finally crested” 


TOOLS OF THE TRADE 
Loren Davis, an archaeologist at Oregon State 
University in Corvallis, is searching for the 
ancient seafarers in an unusual spot — ata site 
in Idaho called Cooper's Ferry, which is on a 
bank of the Salmon River, hundreds of kilo- 
metres away from the coast. At the dig site in 
August, Davis examines a piece of rock brought 
to him by one of his field crew. He turns it over 
to see whether it was shaped by human hands, 
perhaps by early toolmakers who littered the 
ground with flakes of rock as they worked. 
Although Cooper's Ferry is far inland, Davis 
suggests that it is part of the coastal story. The 
Salmon is a tributary of the mighty Columbia 
River, which would have been the first large 


eee 
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waterway encountered by people who made 
it south of the ice sheets during the last gla- 
cial epoch. At that time, valleys farther north 
would have been covered by glaciers. For a 
water-adapted culture, he says, “the first off- 
ramp south of the ice is the Columbia River”. 

Having considered the stone, Davis hands 
it back to his colleague and says, “I think it is a 
flake.” His archaeological pits, which the crew 
has shaped into a series of neat holes, are full of 
flakes and finished ‘western stemmed’ points 
up to 13,200 years old’. Whereas the Clovis 
points are shaped like miniature surfboards, 
the western stemmed points from Cooper's 
Ferry are smaller and look like Christmas trees. 
Points resembling the western stemmed vari- 
ety have been found throughout the western 
United States and in Siberia — a connection 
that suggests they were brought over to the 
New World by early hunters. 


Archaeologists search for signs of early inhabitants near a riverbank at Cooper’s Ferry, Idaho. 
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first Americans carried these points there by sea 
and river. “You get a gambler’s mentality,’ Davis 
says. The hunt obsesses the crew, who spend 
weeks here, camped out and digging for hours 
each day. Sarah Skinner, an Oregon State stu- 
dent who supervises pit B, says that she wakes 
up clenching her fists around dream trowels. 
“When I close my eyes, I see artefacts,” she says. 


HINGE-POINT HUNT 

Signs of early inhabitants are also starting to 
appear along the coast, particularly in spots 
where the swelling seas have not covered 
ancient shorelines since the end of the last gla- 
cial period. The western coast of Canada, for 
example, was pressed down by the Pleistocene 
ice and has been rebounding upwards since 
the glaciers melted. In some spots — hinge 
points — that post-glaciation rebound almost 
exactly cancels out the rising sea level”. One of 


“This is probably the biggest effort to identify 
submerged sites along the Pacific coast.” 


Davis's crew is quietly intent, and the air is 
filled with the gentle sound of trowels scrap- 
ing earth, along with a rock wren’s distinc- 
tive call. The peace is occasionally broken by 
shouts between diggers and data recorders: 
“Bone!’, “Fire-cracked rock!” or “Deb!” (short 
for debitage, or flakes). The position of each 
artefact is precisely recorded, then it is bagged 
up and stored in one of many boxes that are 
piling up in a nearby trailer. Precise dates will 
be assigned later, in the laboratory. 

A sense of expectation hangs over the dig. 
If the team uncovers particularly old western 
stemmed points that definitively pre-date the 
Clovis era, that would strongly suggest that the 


those hinge points is Calvert Island, where a 
13,200-year-old footprint was found late last 
year and another was discovered this sum- 
mer. Daryl Fedje and Duncan McLaren, both 
archaeologists at the University of Victoria in 
British Columbia, plan to continue working the 
site to look for signs of the earliest Americans 
(see ‘Welcome to America). 

The Hakai Institute on Calvert Island, 
founded by Canadian entrepreneur Eric 
Petersen, is supporting that work. “As a fourth- 
generation British Columbian,’ Peterson says, 
“Tam intensely interested in the rich history of 
humans on our coastline, which we now realize 
goes way, way back. How far back? Thirteen 
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Calvert Island 
13,200 years ago 


The Manis mastodon site : 0 © 
13,800 years ago ; 


Paisley Caves 
14,100 years ago 


WELCOME TO AMERICA 


Humans could not have walked into the heart of North 
America until a corridor opened between the 
continental ice sheets: 14,000 years ago at the earliest, 
and possibly not until 12,500 years ago. However, a 
number of New World archaeological sites date from 
before then (yellow dots), so researchers are exploring 
whether humans colonized the Americas by boat. 

shee Land migration route 

tees Coastal migration route 


thousand years? Fifteen thousand years? That's 
what we want to find out.” 

Mackie thinks that Calvert Island and simi- 
lar hinge points will produce results much 
faster than underwater work, which is techni- 
cally challenging and expensive. “You might 
as well just stand on your boat and burn $100 
bills,” he says. 

But despite the enormous cost and technical 
challenges, he and others agree that underwater 
locations may hold tremendous potential and 
that the time has finally come for archaeologists 
of American prehistory to explore the Pacific 
Ocean. There have been a few projects over the 
years. In the late 1990s, Mackie and Fedje did 
some sea-floor mapping around Haida Gwaii, 
an island off the British Columbia coast’. They 
took samples of sea-floor sediment and hauled 
up a barnacle-encrusted flake tool that they sug- 
gest dates from 10,000 years ago, when the sea 
floor on which it was found was dry land. More 
recently, the duo used an autonomous under- 
water vehicle to explore and found what they 
suggest could be a fishing weir — a trap made 
from rocks — that dates back 13,800 years. 

Archaeologist James Dixon of the Univer- 
sity of New Mexico in Albuquerque has done 
marine surveys of the now-submerged land that 
once connected Asia and North America. And 
Jon Erlandson, an archaeologist at the Univer- 
sity of Oregon in Eugene, has worked for years 
on the Channel Islands off Southern Califor- 
nia, piecing together evidence for his theory 
that people followed a ‘kelp highway’ down 
the coast. This route would have offered abun- 
dant food — fish, shellfish and marine mam- 
mals — supported by the kelp forest ecosystems. 

But the offshore studies so far have been lim- 
ited, and most of the discoveries have not been 
reliably dated. “There's very few of us, and we are 
spread over vast, vast areas,’ says Dixon. There 
has been a hesitation to join — or fund — the 
chancy and expensive underwater search. “You 
can go out there and be totally skunked by the 


\— Cordilleran ice sheet 


— Glacial 
maximum 


~— Laurentide 
ice sheet 


Cooper’s Ferry 
13,200 years ago 


Buttermilk Creek and 
Debra L. Friedkin sites 
15,500 years ago 


© Monte Verde 


° 15,300 years ago 


weather,’ he says. “It is a tough thing” 

That has led many researchers to discount 
the coastal hypothesis in the past, notes 
Mackie. “People thought ‘well, all the infor- 
mation is deep underwater and we'll never 
find it? he says. And so far, nothing older than 
the Clovis era has been found along the Pacific 
Coast — either on the sea floor or on land. 


SPARK OF INTEREST 

What snapped the field out ofits funk was not 
a charismatic leader or a spectacular find — it 
was federal bureaucracy. 

The US Bureau of Ocean Energy Manage- 
ment (BOEM) was formed in 2010 to regulate 
energy development on the continental shelf. 
The bureau is bound by the National Historic 
Preservation Act, which requires it to make 
sure that valuable archaeological sites will 
not be destroyed by any development that 
requires a federal permit. As interest in offshore 
renewable-energy projects has increased in 
recent years, BOEM has scrambled to improve 
methods for identifying prehistoric sites. 

In 2011, it commissioned a sweeping study 
of possible archaeological sites on the Pacific 
outer continental shelf. Davis anda colleague at 
Oregon State, archaeologist Alex Nyers, worked 
on the report, using existing ocean-depth data 
and estimates of sea-level rise to decipher where 
previous shorelines would have been’. They 
then modelled where prehistoric sites might be 
clustered: presumably on gentle, south-facing 
(and thus warmer) slopes and near lakes, rivers, 
bays and islands, all now submerged. 

That report came out in 2013, and led 
directly to a US$600,000 grant from BOEM 
to seek out evidence for the predictions about 
prehistoric environments. On a series of 
cruises off California and Oregon over the 
next three years, researchers will use a variety 
of sonar instruments to survey the ocean floor 
and sediments below. If they identify a pos- 
sible estuary, beach or other ancient shoreline 
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feature, they will take core samples and carbon 
date biological material from the various layers 
of sediment to confirm the find. 

Principal investigator Todd Braje, an 
anthropologist at San Diego State University 
in California, is trying to expand the project 
by encouraging the US National Oceanic and 
Atmospheric Administration and other poten- 
tial funders to add more cruises. But even at 
its current size, he says, “This is probably the 
biggest effort to identify submerged sites along 
the Pacific coast.” 

The investment may be big, but Braje is try- 
ing to keep expectations modest. He insists that 
the goal is to learn how to identify environ- 
ments in which humans might have camped 
or settled up to 20,000 years ago; the team is 
not expecting to find the remains of any set- 
tlements, and certainly not ones older than 
those of the Clovis settlers. “The idea that we 
are going to hit ona 15,000-year-old site that 
is underwater is probably unrealistic in the 
near future,” says Braje. “You get to those first 
migrants into the New World and the archaeo- 
logical footprint they left is very small” 

The project will build on Davis's model of 
submerged environments, using coring and 
imaging to test whether his projections actu- 
ally lead them to the right sorts of sites. Davis 
is a co-principal investigator and will join the 
Oregon cruises next year. 

In the meantime, he is digging in Idaho. It 
is near the end of the field season, and he and 
his crew are working on their day off to fin- 
ish as much as possible. He has bribed them 
with gourmet cheese, and he lays it out with 
no fewer than five cheese knives. Combined 
with the trowels, brushes, scrapers and spoons 
used by the crew, the site is bristling with tools. 

Given all the difficulties of this work, those 
involved in investigating the ocean-migration 
hypothesis stress that expectations should 
remain modest for many years as researchers 
improve their search methods. If the theory is 
correct, the first definitive older-than-Clovis 
find along the coast — the green light for 
the theory that everyone seems to be hoping 
for — could still be far off. “It could happen 
this summer, next summer, it could be ten 
years,’ says Erlandson. 

Or it could happen right now in the swelter- 
ing pit at Cooper's Ferry, with the very next 
scrape of a trowel. m 


Emma Marris is a freelance writer in 
Klamath Falls, Oregon. 
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Track urban emissions 
on a human scale 


Cities need to understand and manage their carbon footprint at the level of 
streets, buildings and communities, urge Kevin Robert Gurney and colleagues. 


( ities are taking steps to combat climate 
change, given the scant progress 
made by international treaty nego- 

tiations. Los Angeles, California, home to 

around 4 million people, has one of the most 
ambitious targets: to reduce greenhouse-gas 

emissions by 35% below 1990 levels by 2030. 

The city has calculated its carbon ‘footprint’ 

and found that road vehicles constitute 47% 

of total carbon dioxide emissions, and that 


electricity consumption constitutes 32%". So 
how should Los Angeles target its policies? 
Knowing that certain roads, types of 
vehicle or parts of a city dominate road emis- 
sions and why people drive at specific times 
would tell city planners where and how to 
lower emissions efficiently. Improvements 
in traffic congestion, air quality, pedestrian 
conditions, and noise pollution could be 
aligned. But tracking emissions road by 


road and building by building is beyond the 
capacity of most cities. 

Luckily, scientists are gathering the data 
that city managers need — in studies that 
match sources of CO, and methane with 
atmospheric concentrations. Now the 
research community needs to translate this 
information into a form that city managers 
can use. Emissions data need to be merged 
with socio-economic information such > 
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> as income, property ownership or travel 
habits, and placed in software tools that can 
query policy options and weigh up costs and 
benefits. And scientists should help munici- 
palities to raise awareness of the power of 
detailed emissions data in tailoring climate 
and development policies. 


CARBON HOTSPOTS 

Cities account for more than 70% of global 
fossil-fuel CO, emissions, the main driver of 
climate change. If the top 50 emitting cities 
were counted as one country, that ‘nation’ 
would rank third in emissions behind China 
and the United States”. Urban areas are set to 
triple globally by 2030 (ref. 3). 

Much of this emitting landscape falls 
within the control of mayors, city planners, 
businesses and community groups that have 
responsibility for residents health and well- 
being. A 2014 survey lists 228 global cities — 
representing nearly half a billion people 
— that have pledged reductions equivalent 
to 454 megatonnes of CO, per year by 2020 
(see go.nature.com/inaxr4). Shenzhen in 
China, for example, aims to put an extra 
35,000 electric vehicles on the road by the 
end of 2015. The German city of Munich 
aims to produce enough green electricity by 
2025 to meet all its power requirements. 

Yet such pledges account for only about 3% 
of global urban emissions and less than 1% 
of total global emissions projected for 2020 
(ref. 4). Rich cities dominate these pledges, yet 
low- and middle-income countries are expe- 
riencing the greatest urban growth. 

Slashing emissions requires mapping 
them on finer scales of space and time that 
reflect the human dimensions at which 
carbon is emitted: by individual buildings, 
vehicles, parks, factories and power plants. 
These should be tracked at least yearly. 
Such granular estimates are needed for 


several reasons: to verify emissions rates; 
to confirm progress towards reduction and 
support carbon trading, permits or taxa- 
tion*; to enable more-targeted and finan- 
cially efficient decisions about mitigation 
options®; and to identify and fix uninten- 
tional releases from, for example, leaking 
gas pipes or malfunctioning methane- 
capture equipment in landfills. 

Cities already approach air-quality 
improvement, regional development, 
transport planning and waste disposal on 
a house or road scale. Adding low-carbon 
policies to these efforts could benefit them 
all. For example, reducing traffic conges- 
tion would lower air pollution and traffic 
accidents and improve commutes. And tar- 
geting residents’ immediate needs widens 
public acceptance. 


THE PROBLEMS 

Although methods to account for 
community-scale emissions have been 
designed by non-profit organizations such 
as the World Business Council for Sustain- 
able Development and the World Resources 
Institute (see go.nature.com/q7wjeb), most 
cities lack independent, comprehensive and 
comparable sources of data. The expertise 
and staff required to build this information 
are costly. Transparency of data and methods 
is also crucial to enable verification by third 
parties and to build trust. 

Scientists are starting to meet these chal- 
lenges. In the past five years, ‘bottom-up’ 
estimations of carbon emissions from fuel 
reporting, traffic data, building information 
and human activity are being merged with 
‘top-down’ atmospheric measurements over 
cities of CO,, methane and “CO, — an iso- 
tope of CO, that reflects fuel combustion’. 
Such efforts began in the late 2000s in Paris 
and in the US cities of Indianapolis, Boston, 


~ 
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The Los Angeles Megacities Carbon Project will measure urban carbon levels from the ground and sky. 
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Salt Lake City and Los Angeles; more are 
planned for Sao Paulo, Brazil, and cities in 
Australia, China, the United Kingdom and 
Canada. These studies cost millions of dol- 
lars, and involve at least a dozen monitoring 
sites and analysis of remotely sensed data 
and modelling efforts. Many of these data 
sets are now publicly available. 

Links between ground-based and satellite 
remote sensing are improving. For exam- 
ple, Japan’s Greenhouse Gases Observing 
Satellite (GOSAT) has shown* that space- 
borne CO, measurements can constrain 

the ‘domes’ of the 


“The gas that lie above 
measurement, cities. This work 
monitoring and will continue with 
modelling o NASA's Orbiting 
urban paces Carbon Observa- 
flowsisa tory 2 (OCO-2), 


which launched in 
July 2014. 

Future space mis- 
sions (such as OCO-3, planned for 2018) 
will have a ‘city mode’ that will monitor 
urban areas and power plants monthly. 
The European Space Agency’s Sentinel-5 
mission (to launch in 2016) should provide 
near-global measurements of large meth- 
ane emitters on the urban scale every few 
days or weeks. Work is also under way to 
characterize infrastructure in high-reso- 
lution images. Complemented by ground- 
based information such as traffic data 
from mobile phones, this could reveal, for 
example, which types of building or loca- 
tion account for disproportionate urban 
emissions and why. 

Much needs to be known to make the 
science suitable for policymakers and 
planners. For example, what level of gran- 
ularity and accuracy is most useful? How 
many atmospheric monitoring stations 
are sufficient to calibrate or anchor emis- 
sions inventories? How does this scale with 
city size or types of emission (road versus 
industrial, say)? 

Existing information systems are cum- 
bersome, and although good at quantify- 
ing emissions, they are unable to explain 
the roots and controls of carbon flows. 
Researchers need to understand the rela- 
tionships between urban carbon fluxes and 
the social norms, technology, economics 
and institutional constraints that drive 
emissions. This is especially important in 
low- and middle-income countries. 


global need.” 


INTERNATIONAL COLLABORATION 

More collaboration among disciplines 
is needed. For example, engineers have 
modelled how emissions change when 
transit systems or compact urban develop- 
ment strategies are introduced. But tech- 
nological and infrastructure changes are 
rarely modelled within socio-ecological 


NASA/JPL-CALTECH 


systems’. Social scientists are examining 
the connections between wealth, popula- 
tion size or density and carbon emissions”, 
but not within realistic, economically con- 
strained, engineered landscapes. 
Translating urban carbon science into 
solutions requires two key steps. First, it 
must become ‘operational’. Like weather 
stations, data and forecasting, the meas- 
urement, monitoring and modelling of 
urban carbon flows is a global need that 
is best accomplished collectively. This 
requires long-term collaborative fund- 
ing and institutional support beyond the 
typical three-year research-grant cycle. 
Second, an independent intergovern- 
mental centre (with regional representa- 
tion) is needed to ensure standardization 
and priority. This could be funded jointly 
by governments, foundations and inter- 
governmental institutions. Such an 
‘urban carbon solutions centre’ must gen- 
erate practical results, tools and carbon- 
mitigation options with the involvement 
of community groups, mayoral staff and 
energy providers. Cities could pay the 
solutions centre to provide information 
tailored to their locale. Some work could 
be undertaken by the private sector. 
With detailed knowledge of carbon flows, 
cities might succeed in reducing global 
emissions where nations have failed. = 
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Women are funded more 
fairly in social science 


UK data hold lessons for how to close the gender gap in 
bioscience grant applications, success and size, argue 
Paul Boyle and colleagues. 


espite the increasing commentary 
D and debate on gender disparities 
in science!”, equality will not be 
achieved without proactive support from 
key institutions. 
One of the key drivers of academic ine- 
quality is the receipt of competitive grant 
funding. In the bio- 


medical sciences, ONATURE.COM 
women get smaller For Nature's special 
grants than men in _ issue onwomenin 
the United States’ and science, see: 

the United Kingdom’. __nature.com/women 


Similarly, figures from the European 
Research Council (ERC) for 2007-13 show 
that women make only one-quarter of grant 
applications, and they receive just one-fifth 
of awards. This pattern is evident at differ- 
ent rates across disciplinary domains: in the 
physical sciences and engineering, women 
submit 17% of grant applications and receive 
15%; in the life sciences, 30% and 21%; and 
in the social sciences and humanities, 36% 
and 31% (see go.nature.com/ngfvc3). 

We find that UK social-science funding 
does not show such gender bias. When > 
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ILLUSTRATION BY BELLE MELLOR 


CHECKING THE BALANCE 


Data from the United Kingdom’s main 
social-science funding body show little difference 
between female and male social scientists in 
application rate, success rate and grant size. 


1 | Overall grant applications and success 


Wi Men | Women 


G Application success 


18% 82% 


(successful) 


(unsuccessful) 


18% 82% 
861 


2 | Grant applications with age 


Women’s application rate and 


success declined with age. 


(16% of men 
successful) 


(23% of women 
successful) 


3 | Grant applications and awards by 
professional grade 
Non-professors (readers, senior lecturers, 
lecturers, researchers) 


(15% of men 
successful) 


(17% of women 
successful) 


Applications 
Proportion of 0 
Professors 
(20%) (20%) 


Applications 
Proportion of 0 


4 Amounts awarded 


Median size of awarded grants 
Non-professorial 


NE £272.000 
[EEN £280.000 


Professorial 


[ETE £349:000 


Total budget split 


Over 5 years, only 
41% of the total 
£127 million went 
to women because 
fewer women are 
professors. 


> academic position is accounted for, in the 
data we present here there is little difference 
between female and male social scientists in 
application rate, success rate and grant size. 
We discuss some lessons that these results 
may hold for the biomedical sciences. 


WHO GOT WHAT? 

We considered applications to the 
UK Economic and Social Research Coun- 
cil (ESRC) Research Grants ‘open call’ 
scheme between 2008 and 2013 (one of 
us, P.B., was chief executive of the ESRC 
from 2010 to 2014). We examined whether 
women and men submitted a similar num- 
ber of grants, and their respective success 
rates and sizes of awarded grants. We used 
the UK government’s Higher Education 
Statistics Agency for data on numbers of 
men and women in social-science aca- 
demic jobs in the United Kingdom. The 
results described here are underpinned by 
robust multivariate analyses. 

We found that women were less likely than 
men to apply for grant funding (making up 
41% of applications), even though there were 
only slightly fewer women (48%) than men 
in social-science academic posts. But women 
and men were equally successful in winning 
ESRC grants (18% success rate for both; see 
‘Checking the balance’). 

Women’s application rates and funding 
success declined with age; men in all age 
groups had a similar success rate. Women 
under the age of 40 applied for as many 
grants as men in that age group, and were 
more successful. Women over 50 applied for 
fewer grants and were less successful than 
men in the same age group. 

Comparing applications and success 
by academic position, we find that it is the 
smaller number of female professors that 
accounts for the overall difference in grant 
applications between men and women, and 
the greater success of older men. Male and 
female professors were equally successful, 
and women at lower grades were slightly 
more successful than men at the equivalent 
grade. Indeed, female professors were more 
likely to bid than their male counterparts: 
women made 30% of professorial applica- 
tions, even though nationally only 24% of 
professorial posts were held by women. 

The median amount awarded was not 
significantly different for women and men. 
On average, female professors won slightly 
larger amounts than male professors. How- 
ever, over the five-year period, 59% of the 
£127 million (US$198 million) allocated 
went to men, because fewer applications 
were received from women (recall that, 
accounting for academic position, those 
who applied for grants were equally suc- 
cessful). 

The analyses we present are based on 
standard, routinely available data. Thus, 


182 | NATURE | VOL 525 | 10 SEPTEMBER 2015 


© 2015 Macmillan Publishers Limited. All rights reserved 


several caveats should be borne in mind. 
These include: the difficulties of combin- 
ing different data sources on staff eligible 
to apply for awards and actual applications; 
anda lack of further detailed information on 
applications, such as the number and gender 
of co-applicants. 

Even so, there are clearly disciplinary 
differences in women’s funding engage- 
ment and success. A comparison of figures 
from the UK Medical Research Council 
(MRC) and the 
ESRC shows that 


“Significant 

chan 2 is although the pro- 
unlikely, without oo of ane 
some bold in biomedical and 


social-science dis- 
ciplines is similar 
(43% and 45%, 
respectively), the proportion of female 
grant applicants in 2012-13 was 27% at the 
MERC and 42% at the ESRC (see go.nature. 
com/pesa2z). Furthermore, ESRC grants 
secured by female social scientists are of 
comparable size to those awarded to men. 
By contrast, at the Wellcome Trust’, a major 
UK biomedical-research charity, awards 
between 2000 and 2008 were on average 
around £44,500 (around 15%) bigger for 
men than for women. 


re-structuring.” 


DISRUPTING HIERARCHIES 

Whether these differences are a result of 
endemic discriminatory practices that dis- 
courage women from applying for awards 
— and for larger ones — in biomedical 
disciplines should be the focus of intense 
scrutiny. 

It is interesting to consider why women 
may be better served in the social sciences. 
The positive consequences of higher levels 
of female representation in social-science 
disciplines include a move away from ‘con- 
ventional gender expectations’ that align 
with hierarchical, individualistic and com- 
petitive behaviours. Social scientists have 
long been engaged with feminist research- 
management practices, with the guiding 
principles of consultation, collaboration and 
social equality, which have disrupted male 
hierarchies®. Critiques of knowledge crea- 
tion that exclude women as both research- 
ers and participants have ensured that men 
in the social sciences have long been aware 
of the ingrained, institutionalized male cul- 
ture of universities’ — an awareness that may 
be taking longer to permeate the science, 
technology, engineering and mathematics 
(STEM) disciplines. 

Even so, the lack of women in professorial 
positions means that 59% of the total funds 
disbursed by ESRC between 2008 and 2013 
in this study was allocated to men. Young 
female social scientists of today submit simi- 
lar numbers of ESRC applications as equiva- 
lent men, are as successful and receive grants 


SOURCE: ESRC 
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BOLD ACTION 


Ten ways institutions must close the gender gap 


Steps for funding agencies worldwide 

® Commit to ambitious expectations for 
gender performance that link to eligibility 
for receiving awards, following the lead of 
the National Institute for Health Research. 
® Introduce targets for minimum gender 
representation on funding panels. 

@ Train selection panels on gender-equality 
issues, including unconscious bias. 

® Submit data annually to independent 
scrutiny of gender differences in 
applications, success rates and award sizes. 


of comparable size; it is quite possible that 
they will maintain this success as they age. 
Yet this is unlikely to transpire if women 
cannot access the more senior positions that 
men have dominated. And as these women 
rise through the ranks they will not experi- 
ence the same work-life balance as men, the 
same child or parental care responsibilities, 
or the same cultural attitudes to the impor- 
tance of their labour. Consequently, they will 
be more likely to have part-time or fixed- 
term contracts and to take career breaks. 

In other words, young women perform 
well today, but they will continue to match 
men only if structural changes are imple- 
mented within universities and funding 
agencies (see “Ten ways institutions must 
close the gender gap’). If the current pace of 


@ Publish figures to allow cross-agency 
and cross-national comparison by 
discipline. 


Steps for universities worldwide 

@ Publish gender breakdowns in key areas 
including promotions, appointments and 
rewards in a consistent way, allowing 

for cross-institution comparison; such 
transparency would allow prospective 
employees and students to assess the 
institutional culture. 


change continues, it will take 39 years for 
women to be represented equally among 
the UK professoriate — and this is likely 
to vary widely by discipline (see go.nature. 
com/gwihpt). 


SIGNS OF IMPROVEMENT 
It would be wrong to assume that nothing is 
being done. All seven UK research-funding 
councils, through Research Councils UK, 
have published expectations for themselves 
and for institutions in receipt of their fund- 
ing, and these statements include an ongo- 
ing commitment to promoting cultural 
change in relation to equality and diversity. 
The Athena SWAN initiative provides a 
framework for addressing gender imbal- 
ances in biomedicine and has catalysed 
action — particularly since the 
National Institute for Health 
Research made attainment 
of an Athena SWAN silver 
award a requirement for 
certain large-scale funding. 
Remarkable progress 
has been made elsewhere, 
most notably in Nordic countries. In 
Finland, for example, equality legisla- 
tion introduced 20 years ago requires a 
minimum representation of 40% of either 
gender on any committee responsible 
for public spending, including research 
funders. Although controversial, even 
among some ardent proponents of gen- 
der equality, the rule has resulted in sub- 
stantial change. By 2010, women made 
up 50% of the board of the Academy of 
Finland and of the country’s scientific 

committees. 

Despite such signs of improve- 
ment, gender inequality along the 
science-career trajectory contin- 
ues to be pervasive. Men earn more 
than women’; academics who are 

mothers are less likely to be pro- 
moted and have lower salaries than 
women who do not have children’; and 


@ Embed gender-equality issues in work 
practice. Become beacons of good practice 
for public-sector and private employers. 

® Support women’s career progression 
through the ongoing development of 
promotion criteria that focus on quality 
rather than quantity. 

@ Engage men in championing gender 
equality. Commit to the principles and 
uptake of shared parental leave. 

@ Celebrate women’s achievements equally 
in a public way. 


‘Brian’ is more likely to be hired than ‘Karen’ 
as a professor, even if they have identical 
applications’*. Consequently, there are fewer 
women in senior professorial, administra- 
tive and university-president roles. Although 
women make up 47% of non-professorial 
higher-education positions in the United 
Kingdom, they account for less than 20% of 
professorial appointments. 

Significant change is unlikely, with- 
out some bold re-structuring. Bringing 
together funding agencies and a consor- 
tium of prominent universities who have 
shown commitment to these issues to 
develop coordinated approaches could have 
a significant impact. Organizations such as 
Science Europe and the Global Research 
Council, which have already committed 
to helping to reduce gender inequalities in 
science, should lead the way. m 


Paul J. Boyle is president and vice- 
chancellor of the University of Leicester, UK. 
From 2010 to 2014, he was chief executive 
of the United Kingdom’s Economic and 
Social Research Council. Lucy K. Smith is 
a senior research fellow, Nicola J. Cooper is 
professor of health-care evaluation research, 
and Kate S.Williams is a senior research 
fellow in the Department of Health Sciences, 
University of Leicester, UK. Henrietta 
O’Connor is professor in the Department of 
Sociology, University of Leicester, UK. 
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Richard Dawkins, pictured at home in 2010, popularized a gene-based view of evolutionary biology. 


Dawkins, redux 


Nathaniel Comfort takes issue with the second 
instalment of the evolutionary biologist’s autobiography. 


old Oxford lecturer in animal behaviour, 
published his first book, The Selfish Gene 
(Oxford University Press). Distilling a body of 
recent population-genetics research — nota- 
bly that of W. D. Hamilton — it argued that 
genes, not organisms, were the targets of natu- 
ral selection. An organism, Dawkins wrote, 
was simply a gene’s way of replicating itself. 
The book was a surprise best-seller. Along 
with E. O. Wilson’s Sociobiology (Harvard 
University Press, 1975), it helped to spark 
a new nature-nurture debate that pitted 
sociobiologists against socialist biologists. 
Notable among the latter were the palaeon- 
tologist Stephen Jay Gould and the population 
geneticist Richard Lewontin, who accused the 
sociobiologists of rationalizing social evils 
such as racism and infidelity as genetically 
hard-wired, evolutionarily programmed. Yet 


[: 1976, Richard Dawkins, then a 35-year- 


Dawkins’s fiercely reductionist, materialist 
world view exuded a transgressive sexiness, 
and his suave, swaggering prose appealed to 
many readers, lay and professional. 
Dawkins’ greatest gift has been asa lyricist. 
With terms such as selfish genes, memes and 
the extended pheno- | 
type, he has provided Saag 
much of the vocabu- 
lary of modern evo- 
lutionary biology. He 


has published a sack- 
ful of books laying out a 
the evidence for evolu- 'N rc 


evolU- aan BLE 
tion, against design in Sty 
nature, and for natural 
selection as the only 
mechanism of adaptive 
evolution. A skilled 
and popular lecturer, 


Brief Candle in the 
Dark: My Life in 
Science 

RICHARD DAWKINS 
Bantam: 2015. 
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he also discovered a taste for the camera, 
hosting numerous television documentaries. 

In the early 2000s, he saltated from popu- 
larizer into evangelist. His 2006 book The 
God Delusion (Bantam) was an ecclesiopho- 
bic diatribe, published around the same time 
as Christopher Hitchens’s God Is Not Great 
(Twelve, 2007) and similar books by Daniel 
Dennett and Sam Harris. The gospels of 
Christopher, Daniel, Sam and Richard form 
the scripture of the ‘new atheism, a funda- 
mentalist sect that has mounted a scientistic 
crusade against all religion. 

Now, Brief Candle in the Dark revisits 
Dawkins’s career since The Selfish Gene. Its 
predecessor, An Appetite for Wonder (Bantam, 
2013), was a memoir of a young upper-class 
Englishman becoming a scientist, replete with 
African adventures, British public schools and 
Oxonian traditions. Some reviewers won- 
dered whether the sequel would have more 
heft and focus, reflection and introspection. 
At 450 pages, it is certainly heftier. 

Dawkins has organized Brief Candle 
thematically, making it less memoir than 
annotated catalogue. The first few chapters 
are a scattershot record of his duties as an 
Oxford don, a rare field trip and the Royal 
Institution Christmas lectures. The next few 
devolve into a series of lists: his books, his 
debates, his television appearances. 

Finally, he abandons the memoir format to 
do what he does best: write about science. The 
book concludes with a mammoth 120-page 
chapter recapitulating the ontogeny of his 
thought. Like Francis Galton, the hereditar- 
ian Victorian biostatistics pioneer, Dawkins 
has a quantitative turn of mind, but is better 
at algorithms than theorems. So indeed is life 
itself, which is why biology has so few laws. 

Much of Dawkins’s research has been 
in silico, writing programs for evolutionary 
simulations. In his simulations, life is utterly 
determined by genes, which specify develop- 
mental rules and fixed traits such as colour. 
The more lifelike his digital animals (“bio- 
morphs”) become, the more persuaded he is 
that real genes work in roughly the same way. 
Dawkins’ critics accuse him of genetic deter- 
minism. This synopsis of his work shows that 
his life virtually depends on it. 

A curious stasis underlies Dawkins’s 
thought. His biomorphs are grounded in 
1970s assumptions. Back then, with rare 
exceptions, each gene specified a protein 
and each protein was specified by a gene. 
The genome was a linear text — a parts list 
or computer program for making an organ- 
ism —insulated from the environment, with 
the coding regions interspersed with “junk”. 

Today's genome is much more than a script: 
it is a dynamic, three-dimensional structure, 
highly responsive to its environment and 
almost fractally modular. Genes may be 
fragmentary, with far-flung chunks of DNA 
sequence mixed and matched in bewildering 


REX FEATURES 


combinatorial arrays. A universe of regula- 
tory and modulatory elements hides in the 
erstwhile junk. Genes cooperate, evolving 
together as units to produce traits. Many 
researchers continue to find selfish DNA a 
productive idea, but taking the longer view, 
the selfish gene per se is looking increasingly 
like a twentieth-century construct. 

Dawkins's synopsis shows that he has not 
adapted to this view. He nods at cooperation 
among genes, but assimilates it as a kind of 
selfishness. The microbiome and the 3D 
genome go unnoticed. Epigenetics is an 
“interesting, if rather rare, phenomenon” 
enjoying its “fifteen minutes of pop science 
voguery’, which it has been doing since at least 
2009, when Dawkins made the same claim in 
The Greatest Show on Earth (Transworld). 
Dawkins adheres to a deterministic language 
of “genes for” traits. As I and other histori- 
ans have shown, such hereditarianism plays 
into the hands of the self-styled race realists 
(N. Comfort Nature 513, 306-307; 2014). 

His writing can still sparkle. He excels at 
capturing the scenes behind a scene, deftly 
explaining a scientific principle, capping a 
story with an amusing anecdote. His tale of 
palaeoanthropologist Richard Leakey haul- 
ing his legs (amputated after a plane crash) 
to Kenya in his hand luggage for burial is 
funny and touching. Dawkins also makes an 
important case for the “poetic” side of science, 
arguing that the imperative to justify research 
in terms of potential medical or financial ben- 
efits bleeds the beauty out of it. Amen. 

At such moments, one feels transported toa 
tweedy evening at Oxford, pouring the sherry 
as a charming senior faculty member holds 
court. But too often, the professor rambles. He 
quotes friends’ and colleagues’ tributes from 
dust-jackets and afterwords. He mentions the 
fish genus Dawkinsia. He repeatedly slams his 
late rival, Gould (“whose genius for getting 
things wrong matched the eloquence with 
which he did so”). His digressions often come 
offas twee and self-indulgent. Mentioning the 
limping family dog, Bunch, in an apt example 
of an acquired characteristic that cannot be 
inherited, he is reminded of an unfinished 
poem his mother wrote after Bunch died, 
which he prints. “If you cant be sentimental 
in an autobiography, when can you?” he asks. 

For a time, Dawkins was a rebellious 
scientific rock star. Now, his critique of reli- 
gion seems cranky, and his immovably geno- 
centric universe is parochial. Brief Candle 
is about as edgy as Sir Mick and the Rolling 
Stones cranking out the 3,578th rendition of 
‘Brown Sugar’ — a treat for fans, but reinscrib- 
ing boundaries rather than crossing them. = 


Nathaniel Comfort is professor of the 
history of medicine at Johns Hopkins 
University in Baltimore, Maryland. His latest 
book is The Science of Human Perfection. 
Twitter: @nccomfort 
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Books in brief 


Brain Storms: The Race to Unlock the Mysteries of Parkinson’s 
Jon Palfreman FARRAR, STRAUS & GIROUX (2015) 

In 2011, journalist Jon Palfreman was diagnosed with Parkinson’s 
disease. The progressive neurodegenerative condition, characterized 
by tremors and muscular rigidity, affects 7 million people worldwide. 
In this lucid overview, Palfreman interlaces the history of research 
into the disease — linked, like Alzheimer’s, to a rogue protein — with 
therapeutic approaches from deep brain stimulation to the drug 
L-DOPA. Extraordinary case studies abound, such as that of aman 
who can ride a bicycle but not walk, and dancer Pamela Quinn, who 
has devised workarounds that ‘trick’ the body into movement. 


Places of the Heart: The Psychogeography of Everyday Life 

Colin Ellard BELLEVUE LITERARY (2015) 

Why would a street evoke unease, or a shopping centre the desire 
to spend? Psychologist Colin Ellard explores the intersection of 
neuroscience and urban design for answers. Meshing recent 
findings with thoughtful appraisals of their implications, Ellard looks 
at spaces and the awe, lust, boredom, affection or anxiety that they 
trigger. He is richly insightful, particularly on digital encroachments 
into the experience of place: can augmented-reality gear ever vie 
with the hair-prickling thrill of being there? Ellard argues that virtual 
immersion could take a “metaphysical toll”; itis hard not to agree. 


Elephants and Kings: An Environmental History 

Thomas R. Trautmann UNIVERSITY OF CHICAGO PRESS (2015) 

The intelligence, majestic presence and physical prowess of the 
Asian elephant was not lost on India’s monarchs. As historian 
Thomas Trautmann shows in this scholarly environmental history, 
the beast’s usefulness in warfare and its prodigious dietary needs 
ensured royal protection for swathes of forest in ancient India, where 
the wild animals were captured for specialized training. That the 
country still has 30,000 elephants is a testament to their enduring 
place in the collective imagination; but as Trautmann argues, India’s 
surviving patchwork of 31 elephant reserves may not sustain them. 


Why You Can Build it Like That: Modern Architecture Explained 
John Zukowsky THAMES & HUDSON (2015) 

From the squat circularity of New York City’s Guggenheim Museum 
to Abu Dhabi’s swooning, tornado-shaped Capital Gate skyscraper, 
extreme architecture is here to stay. This illustrated roll call by 
architectural historian John Zukowsky zips through 100 “iconic and 
iconoclastic” structures of the past 50 years — shapely, hideous 

or energizingly weird. Norman Foster’s Spaceport America in New 
Mexico, for instance, resembles a giant horseshoe crab in thin- 
shelled concrete, whereas Myron Goldsmith's McMath-Pierce Solar 
Telescope in Arizona is a minimalist ode to the right angle. 


Nature and Wealth: Overcoming Environmental Scarcity and 
Inequality 

Edward B. Barbier PALGRAVE MACMILLAN (2015) 

In this cogent analysis, economist Edward Barbier reveals an 
economic landscape of degraded environments and social 
inequality. The culprit, he argues, is a structural imbalance in 
which natural resources are overexploited and human capital is 
undersupplied. Examining current challenges such as ecological 
scarcity, he concludes that a strategy to rebalance natural and 
human capital is the way forward, however difficult. Barbara Kiser 
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Outdated listing puts 
species at risk 


Loopholes could allow illegal 
wildlife traders and hunters in 
China to evade prosecution or 
to receive reduced sentencing. 
The problem stems from China’s 
Protected Species List (PSL): 
this has not been updated since 
it was implemented in 1989, 
resulting in incongruity with 
newer taxonomy. 

Appendices I and II of the 
Convention on International 
Trade in Endangered Species of 
Wild Fauna and Flora (CITES) 
and the International Union for 
Conservation of Nature (TUCN) 
use taxonomic classifications 
based on recent revisions to 
geographical distributions and 
phylogenetic relationships. Some 
species that were formerly listed 
as exotic to China under CITES 
have had their listing changed to 
native. But because the PSL has 
not been revised accordingly, the 
endemic status of such species is 
not recognized in law. 

For example, the Chinese 
pangolin (Manis pentadactyla) 
is on the PSL as a native 
species. The endangered 
Malayan and Indian pangolins 
(Manis javanica and Manis 
crassicaudata) receive 
protection as exotic animals 
under CITES II, but have been 
endemic to the country since 
at least 2000. In our view, 
these species should be added 
to the PSL immediately to 
ensure that they have the same 
protected status as the Chinese 
pangolin in national legislation. 
The taxonomic status of leaf 
monkeys (Trachypithecus 
spp.) and the Burmese python 
(Python bivittatus) have also 
become inconsistent with the 
PSL, leading to similar risks for 
these species. 

Asa result, illegal traders 
can claim that these animals 
with biogeographical or name 
revisions are not on the PSL, 
even though they may be 
endangered. To alleviate trans- 
border inconsistencies and 
aid enforcement, this naming 


inconsistency issue requires 

that all 181 signatory nations 

to CITES adopt unambiguous 
standardized and internationally 
coherent naming policies, 
following the IUCN Red List 

and CITES Species+ (www. 
speciesplus.net). 

Zhao-Min Zhou* Yunnan Public 
Security Bureau for Forests, 
Kunming, Yunnan, China. 
zhouzm81@gmail.com 

*On behalf of 6 correspondents (see 
go.nature.com/hubzzy for full list). 


Physicists’ report on 
EU green electricity 


The European Physical Society 
has released a report on 
European Union (EU) plans 
for sustainable production 

of green electricity in the 
context of today’s global 
energy and climate challenges 
(see go.nature.com/2blxp9). 
The report advises Europe to 
develop a common energy 
policy that could act asa 
template for other regions. 

It points out that Europe’s 
contribution to global 
greenhouse-gas emissions is 
relatively low, so producing 
electricity without fossil fuels 
would cut global emissions by 
a mere 3-4%. Any plans for 
worldwide green electricity 
structures would need to 
address problems such as 
intermittency and storage, and 
the need for backup systems and 
large, high-capacity electrical 
grids. 

The report suggests that 
energy targets should be 
scientifically justifiable and 
adjusted to be more realistic. 
This would reduce the cost of 
enforcing the targets through 
regulation and encourage 
competition in EU industry. A 
common policy is needed for 
implementing those regulations, 
the report emphasizes. 

It also recommends that the 
public should have access to 
scientific information on energy 
issues rather than to simplified 
plans and projections, and it 


urges Europe to continue to lead 
the way in cutting greenhouse- 
gas emissions. 

Jozef Ongena Laboratory for 
Plasma Physics, Royal Military 
Academy, Brussels, Belgium. 
Christophe Rossel European 
Physical Society, Mulhouse, 
France. 

j.ongena@fz-juelich.de 


Laboratory seawater 
studies are justified 


In our view, your report 
‘Seawater studies come up short’ 
(Nature 524, 18-19; 2015) fails 
to capture the nuances of the 
survey results you discuss (see 
C.E. Cornwall and C. L. Hurd 
ICES J. Mar. Sci. http://doi. 
org/68g; 2015). 

Researchers aim to follow the 
2010 ‘Guide to best practices 
for ocean acidification research 
and data reporting’ (go.nature. 
com/sp5kgn) as they strive 
to understand how marine 
organisms are likely to respond 
to the falling pH of the world’s 
oceans, caused by increased 
carbon dioxide concentrations. 
The logistical constraints of 
testing the effects of seawater 
acidification on marine life in 
the laboratory are considerable. 
However, experiments must 
be replicated while complying 
with the requirements for 
manipulating and monitoring 
seawater carbonate chemistry. 

The paper, co-authored by 
two of us, does not conclude 
from its meta-analysis of such 
manipulation experiments that 
all these studies “come up short”. 
Rather, it uses them to highlight 
the importance and challenges 
of proper experimental design 
for such testing. The examples of 
experimental pitfalls it cites are 
intended not as criticisms, but to 
guide future efforts. 

Together with palaeo-record 
investigations, modelling studies 
and natural and manipulated 
field experiments, we believe 
that laboratory experiments 
are crucial to the mechanistic 
understanding and prediction of 


ocean-acidification impacts. 
Catriona L. Hurd* Institute 

for Marine and Antarctic 

Studies, University of Tasmania, 
Australia. 
catriona.hurd@utas.edu.au 

*On behalf of 7 correspondents (see 
go.nature.com/vucoxr for full list). 


More extensive tests 
for e-cigarettes 


We are concerned that 

the focus on nicotine in 
electronic cigarettes is causing 
other associated risks to be 
underestimated (see Nature 523, 
267; 2015). 

For example, there are no 
moves to evaluate the chemical 
hazards attributable to the 
non-nicotine components of 
e-cigarettes, such as vaporizing 
solvents or liquid ‘flavourings’ 

— let alone to regulate them. 
Neither will the latest European 
legislation on tobacco and 
related products help (see 
go.nature.com/yufgfv). It does 
not cover liquids or devices that 
are branded as ‘nicotine-free’: 
however, these frequently contain 
nicotine and escape safety testing 
through erroneous labelling (see 
C. Hutzler et al. Arch. Toxicol. 88, 
1295-1308; 2014). 

Electronic vaping could also 
offer a route for using drugs 
such as cannabis (A. J. Budney 
et al. Addiction http://doi. 
org/7cc; 2015). With the 
technology set to evolve and 
spread rapidly, more rigorous 
and extensive evaluation is 
urgently needed. 

Frank Henkler, Andreas Luch 
German Federal Institute for 
Risk Assessment (BfR), Berlin, 
Germany. 
andreas.luch@bfr.bund.de 
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OBITUARY 


Oliver Sacks 


(1933-2015) 


Neurologist who made house calls. 


he final stage of life, Dante wrote, 
| is like a ship, gradually lowering its 
sails at the approach of the harbour. 
It is a serene image of destination — and 
eminently unfit to characterize the final 
decade of Oliver Sacks’s life. Against a tide 
of diminishing health, he added four books 
to an already impressive catalogue: Musi- 
cophilia (2007), The Mind’s Eye (2010), 
Hallucinations (2012) and — only months 
ago — On the Move, a candid sequel to his 
childhood memoir Uncle Tungsten (2001). 
Several more books are nearly finished. 

Oliver Wolf Sacks, who died in New York 
City on 30 August, was born in London in 
1933 into a large Jewish family. His father 
was a general practitioner, his mother a 
surgeon. His aunts and uncles were inven- 
tors, chemists and physicians. He grew up 
with the sense that it was a family duty to be 
‘scientific. In 1939, at the start of the Second 
World War, he was sent away to a board- 
ing school in the English Midlands. Sacks, 
who would rather have been in danger with 
his family than safe without them, spent 
four miserable years there. The experience 
scarred him for life: “sent away” is how he 
put it 75 years later in the opening sentence 
of On the Move. 

Reunited with his family in 1943, Sacks 
developed a passion for chemistry. Although 
he eventually chose to study medicine, con- 
templation of the periodic table never ceased 
to soothe him in times of turbulence. Sacks 
studied at Queen’s College, University of 
Oxford, UK, qualifying as a physician in 
1958. He left for the United States in the 
early 1960s and began five years of medical 
training, interspersed with riding motor- 
cycles, working out in gyms, experimenting 
with amphetamines and lifting weights on 
Muscle Beach, California. When a stint in a 
neurochemistry lab ended with a resound- 
ing, “Sacks, you are a menace in the lab! Why 
don't you go and see patients — you'll do less 
harm’; he decided to do just that. In 1965, he 
took up consulting at Beth Abraham Hospi- 
tal in the Bronx, New York. 

In its wards he encountered some 80 sur- 
vivors of the ‘sleepy sickness’ pandemic of the 
1920s. He found them to be frozen, mostly, in 
a statuesque, ‘parkinsonian state. High doses 
of the Parkinson's disease drug L-dopa ‘awoke’ 
them from their lethargy, but — as indicated 
by their vocabulary, likes, dislikes and skills — 
ina state of mind belonging to 40 years before 
and in a world that was no longer theirs. Sacks 


noticed such diverse reactions from patient 
to patient that he adapted what was initially 
intended to be a conventional double-blind 
trial to a series of case histories, which he 
published in 1973 as Awakenings. 

After reading Awakenings, Russian neuro- 
psychologist Aleksandr Luria sent Sacks a 
letter. He praised Sacks’s talent for observa- 
tion and description, which reminded him 
of the nineteenth-century tradition of the 
neurological narrative. Much of what was 
to become vintage Sacks unfolded from this 
book. His work was case-oriented rather 
than population based, descriptive and inti- 
mate rather than detached. And he wrote 
books, not series of papers in neurologi- 
cal journals. To this he added his signature 
approach of making house calls. He tried to 
meet his ‘cases’ in their natural surround- 
ings. He observed, for example, a surgeon 
with Tourette's syndrome while he was oper- 
ating; visited Temple Grandin, a woman with 
autism, in her office in the animal-sciences 
department of Colorado State University in 
Fort Collins; and immersed himself in the 
world of deaf culture. 

The case histories in The Man who 
Mistook His Wife for a Hat (1985) secured 
him a worldwide audience. It also helped 
to articulate his scientific credo. Taking 
inspiration from German neurologist Kurt 
Goldstein, Sacks came to think of neurologi- 
cal disorders as challenges to finding a new 
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equilibrium. In response to injury or disease, 
people go through a phase of adaptation 
and reorganization, often mobilizing inner 
resources that have previously lain dormant. 
According to Sacks, it is the physician's task 
to help patients to achieve a new order by 
being sensitive to these altered orientations. 

As Sacks indicated in On the Move, 
growing up at a time when homosexuality 
was still listed as a mental disorder by the 
American Psychiatric Association alerted 
him to the sometimes detrimental conse- 
quences of psychiatric labelling. Rather than 
locking individuals in a ‘condition; he took 
the upbeat perspective of pointing out the 
benefits over the deficits — sometimes to 
the point of eclipsing the original pathology. 
In many cases, this had a liberating effect: 
one may have Tourette’s syndrome and still 
become a surgeon, or, like Grandin, have 
autism and have a career in science. Sacks 
thought in terms of neurodiversity — the 
idea that conditions result from normal 
variation — well before the term became 
common among those who distanced 
themselves from the medical perspective 
on autism. 

Sacks saw himself as a storyteller, not a 
theorist. He often said that he was happy to 
present the case material that others could 
use to devise grand theories. But each story, 
of course, is a theory. Like Goldstein and 
Luria before him, he let his case histories 
shore up the theory of the brain as an organ 
that should be understood holistically, as 
an organism capable of plasticity and com- 
pensation. Although not the inventor of the 
neurological narrative, Oliver was certainly 
its culmination. For the coming decades, his 
legacy will be safe in the hearts and minds of 
millions of readers. 

In conversation, I once brought up his 
numerous honorary degrees, awards and 
fellowships — but Oliver was quick to raise 
his hand to halt me, and said simply that he 
believed he was a good doctor. He felt that 
his parents recognized that he had become 
a careful and perceptive neurologist. Even 
in his eighties, being a good son was still a 
defining ambition of his life. m 


Douwe Draaisma is professor of the history 
of psychology at the University of Groningen 
in the Netherlands. He interviewed Oliver 
Sacks in 2005 for his book The Nostalgia 
Factory and stayed in contact with him. 
e-mail: d.draaisma@rug.nl 
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Simple market models fail the test 


An analysis of energy markets with prices that vary according to demand finds that this market design unexpectedly serves 
to amplify, rather than dampen, fluctuations in power use. 


ALEX PENTLAND 


f rom the dynamic pricing of electricity 


to congestion-based road tolls, simple 

market models lie hidden within much 
of our current thinking about government, 
regulation and policy. However, this sort of 
market thinking can easily go wrong, as exem- 
plified by Krause et al.' in Physical Review E. 
The authors find that variable energy prices 
that are designed to adapt demand to supply, 
and thereby dampen fluctuations in power use, 
in fact amplify these fluctuations. 

Variable pricing in power markets and other 
market models is based on what the eight- 
eenth-century economist Adam Smith called 
the “invisible hand” — the idea that market 
competition efficiently allocates resources 
according to need. This idea was mathemati- 
cally codified by the start of the twentieth 
century by working out what self-interested, 
rational individuals would do in response to 
different price, usage and market conditions, 
and what effect this would have on prices 
and demand. 

Subsequent criticism of the ‘rational’ part 
of this model has given rise to what is now 
known as behavioural economics. However, 
human limitations on rationality — which 
simply means that individuals know their 
goals and act to achieve them — typically only 
bias the market response, and do not invali- 
date the simple market model. In fact, there are 
far more serious problems concealed within 
the rational-individual market model. The 
two biggest limitations are: the focus on how 
the average behaviour of independent actors 
(often assumed to have a normal distribution) 
determines where the market eventually settles 
(reaches equilibrium); and the idea that people 
act independently. If we applied these assump- 
tions to a classroom in which all the students 
copy answers from each other, and half the stu- 
dents have perfect scores but half have failed, 
we would declare the class a success because 
the average grade is a pass. 

Krause et al. demonstrate what can go wrong 
when we apply this average and equilibrium 
thinking to a situation such as a power market, 
in which people's needs and the external situ- 
ation vary hour by hour and day by day, so 
averages and long-term behaviour do not 


Figure 1 | Synchronization spikes. Power markets with variable pricing are designed to shift individuals’ 
power consumption to low-demand times of day. But Krause and colleagues’ modelling’ suggests that 
synchronized consumer behaviour may result in amplified spikes in demand. 


capture the full picture. The energy-market 
model discussed by the authors is similar to 
that of proposed US policy to vary price by 
demand, with the intention that individuals 
will shift activities that use alot of energy (such 
as air conditioning or heating) to low-demand, 
cheaper times of day. The goal of the policy 
is to smooth demand over time, allowing the 
power company to use more-efficient and 
cleaner energy sources. 

The authors find that this sort of market 
thinking is too simplistic. They show that 
when there are unusual events, such as an espe- 
cially hot day or a snowstorm, people's actions 
become synchronized, with everyone turning 
up the air conditioning or the heating at the 
time the price normally drops (Fig. 1). Asa 
consequence, rather than smoothing demand 
as expected, the market will cause huge spikes 
in demand that completely swamp the elec- 
tricity grid, decoupling price and demand, 
and potentially causing power failures and 
even damaging the grid itself. Interestingly, 
this same synchronization process is thought 
to be the source of ‘flash crashes’ that have 
been observed in financial markets, in which 
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markets shed huge amounts of value in a few 
seconds as high-frequency trading algorithms 
become synchronized’. 

In addition to issues arising from a focus 
on averages and long-term behaviour, the 
assumption that people act independently 
can cause other problems. Obviously, people 
do not act independently — they talk to one 
another and learn from each other. Not only 
is their behaviour sometimes synchronized as 
a response to changing external conditions, as 
discussed by Krause and colleagues, but also 
people actively make their actions more simi- 
lar by trying to copy the successful actions of 
others. These peer-to-peer interactions drive 
the evolution of culture, norms and even the 
law. In the context of market-like situations, 
however, people learning from each other can 
lead to financial bubbles, political upheavals 
and health fads**. Such undesirable outcomes 
are particularly likely when there are large, 
rapid changes in the environment, or when 
the communication links between people are 
particularly strong and influential. 

The advent of social media and the crowd- 
sourcing of news has made it much easier 


ANTONIO ARCOS/GETTY 


for people to learn from each other, and the 
pace, inclusiveness and influence of social 
learning and opinion change have increased 
dramatically. The paradoxical consequence of 
this technological move towards greater trans- 
parency, democratization and engagement is 
that fads, political turmoil and bubbles are more 
common than ever before. Because so many 
institutions are based on some sort of simple 
market model, the limitations of this way of 
thinking have become increasingly clear™®. 
What can we do about this problem of 
overly simple market-based policies? It took 
half a century for policymakers and the pub- 
lic to understand the simple market model's 
connection between price and demand. I 
don’t think we can wait another 50 years for 


CELL BIOLOGY 


people to move to a more sophisticated way of 
thinking that accounts for synchronization and 
connections between people. 

Fortunately, we can simulate situations and 
visualize the results on today’s ubiquitous 
digital devices. If we make computational 
modelling of social behaviour a standard part 
of policy debate, as Krause and colleagues 
have done, we can hope to markedly acceler- 
ate a transition towards better and more robust 
social policy. Scientists, policymakers and 
science-funding agencies should make this 
sort of computational social-science model- 
ling a regular part of their portfolio. m 
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Countercurrents 


in lipid flow 


Two studies find that a lipid-exchange cycle mediates the enrichment of the lipid 
phosphatidylserine in the cell membrane compared with the membrane of an 
organelle called the endoplasmic reticulum, where the lipid is produced. 


ANANT K. MENON & TIM P. LEVINE 


ow lipids reach their destinations 
H inside a cell is largely unknown. Most 

are synthesized in an organelle called 
the endoplasmic reticulum (ER) and must be 
delivered to other parts of the cell with speed 
and specificity. For example, the lipid phos- 
phatidylserine is enriched on the inner sur- 
face of the plasma membrane. How does it get 
there from the ER, and how does it become 
enriched at the plasma membrane? Two papers 
in Science’” now present a single mechanism 
that resolves both of these problems. 

The membranes that encase the cell and its 
various organelles each have a unique compo- 
sition of proteins and lipids. In most instances, 
proteins are delivered to these membranes 
by vesicle shuttles that originate in the ER. 
But many lipids favour non-vesicular modes 
of transport*. These are faster than vesicu- 
lar trafficking* and can produce consider- 
able asymmetries in the lipid compositions of 
different membranes, leading to lipid enrich- 
ment at the plasma membrane compared 
with the ER. 

Because lipids are insoluble in water’, 
during non-vesicular transport they must 
be shielded from the water-rich cytoplasm 
by proteins that can hold them in an inter- 
nal pocket. Such lipid-transfer proteins 
(LTPs) often act in regions in which the ER 
comes to within 30 nanometres of another 


membrane**. One class of LTP known to 
mediate ER-membrane contacts is the family 
related to oxysterol-binding protein (OSBP), 
which was originally shown‘ to bind sterols 
such as oxygenated cholesterol derivatives. 


Endoplasmic reticulum 


PS precursor 
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Members of this family, named OSBP-related 
proteins (ORPs) in mammalian cells and OSBP 
homologues (Osh proteins) in yeast, are char- 
acterized by their OSBP-related lipid-binding 
domain (ORD). 

Osh6 and Osh7 in yeast make complexes 
with phosphatidylserine rather than sterol’, 
indicating that, despite the family name, not 
all ORD-containing proteins are sterol bind- 
ers. It is also known that ORD-containing 
proteins can bind to two different lipids, 
even though LTPs generally show specific- 
ity for one lipid only. The only amino-acid 
residues that are evolutionarily conserved 
in all ORD-containing proteins bind to the 
lipid phosphatidylinositol-4-phosphate 
(PI4P), rather than to phosphatidylserine 
or sterol®’. Furthermore, although there are 
two lipid-binding sites inside the pocket of 


Plasma membrane 


PS accumulates —-@ 


OSBP-related 
protein 


Figure 1 | Cycling lipid transport. Two studies'” describe how members of the OSBP protein family 
move the lipid phosphatidylserine (PS) from its site of synthesis in a cellular organelle called the 
endoplasmic reticulum (ER) to the inner surface of the plasma membrane, where it becomes enriched. 
The OSBP-related protein (ORP5 or ORP8 in mammals, Oshé6 or Osh7 in yeast) binds PS and another 
lipid, phosphatidylinositol-4-phosphate (PI4P), in a mutually exclusive fashion. a, PS is picked up at the 
ER and offloaded at the plasma membrane, in exchange for PI4P. b, PI4P is delivered to the ER, where the 
phosphatase enzyme Sac] hydrolyses it to phosphatidylinositol (PI). The cycle is maintained through the 
continual resynthesis of PI4P from PI at the plasma membrane. 
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50 Years Ago 


There is a growing realization that, 
in raising productivity ... to the 

3.5 per cent national target, efficient 
utilization of industrial manpower 
plays a critical part. International 
comparison with other leading 
industrial countries implies that 
British management is wasteful in its 
use of manpower ... The apparent full 
employment of the British economy 
is, in fact, a spurious interpretation 
of the real employment situation. 
Present high employment is falsely 
bolstered by the over-manning of the 
profitable sectors of the industry, and 
retention of labour in industries that 
have ceased to be fully economic 

... Without improved data sources 
or manpower forecast techniques, 
itis difficult to see how the massive 
labour movements, between 

skills, localities of employment or 
between employers, that the present 
economic analyses imply to be 
necessary can be achieved. 

From Nature 11 September 1965 


100 Years Ago 


The Engineer for September 3 has 
an article on the employment of 
women as machinists, with special 
reference to the various shell 
factories ... Some of these girls ... 
were found to be capable of a good 
output on many of the operations 
after only a week's instruction ... 
The output on some of the 
operations exceeded expectation 
owing to the keenness of the girls, so 
much so that some of the machines 
provided have actually been found 
to be superfluous, and other 
machines have been found to be 
capable of more work than had ever 
been believed to be possible. There 
is plenty of such labour available 

in the country, and all the women 
are moved by the keenest spirit of 
patriotism. We trust that employers 
will not hesitate to fill in their blanks 
from this source. 

From Nature 9 September 1915 


ORD-containing proteins, the sites overlap 
considerably, meaning that there is room for 
only one lipid at a time. 

In vitro, LTPs can facilitate lipid diffusion 
only down concentration gradients, which 
in vivo would lead to equal concentrations 
of any given lipid at each membrane. Could 
the bispecificity of ORD-containing proteins 
help them to establish asymmetric lipid con- 
centrations? Previous research*"® suggests 
that Osh4 and OSBP mediate the transport of 
cholesterol from the ER to another organelle, 
the Golgi, by balancing cholesterol transport in 
one direction with PI4P transport in the other. 
PI4P is synthesized in the plasma membrane, 
Golgi and vesicles called endosomes", and is 
hydrolysed to phosphatidylinositol by a PI4P 
phosphatase enzyme called Sac! that resides 
in the ER. The lack of PI4P in the ER mem- 
brane ensures that Osh4 and OSBP can bind 
only cholesterol at the ER. On arrival at the 
Golgi, Osh4 and OSBP offload cholesterol in 
exchange for more PI4P, allowing the cycle to 
continue. In this way, even though both choles- 
terol and PI4P are found at lower levels in the 
ER than in the Golgi, cholesterol can become 
enriched in the Golgi”. 

Could a PI4P countercurrent also drive 
traffic of phosphatidylserine to the plasma 
membrane? In one of the new studies, Moser 
von Filseck et al.” resolved the structure of 
the Osh6-PI4P complex (the structure of the 
Osh6-phosphatidylserine complex is already 
known’), and showed that Oshé6 can exchange 
phosphatidylserine for PI4P between vesicle 
populations in vitro. The ORDs of ORP5 and 
ORP$8 are the closest mammalian counterparts 
to those of Osh6 and Osh7, and, in the other 
study, Chung et al.' showed that the ORD of 
ORP8 contained phosphatidylserine or PI4P 
when purified from cells. 

Both groups showed that the LTPs were 
concentrated at contact sites between the ER 
and the plasma membrane. It is unclear how 
Oshé6 and Osh7 achieve this, but Chung and 
colleagues found that ORP5 and ORP8 bind 
to receptors on both membranes. Both stud- 
ies then showed that these proteins transfer 
phosphatidylserine from the ER to the plasma 
membrane, where the proteins pick up PI4P 
that is subsequently delivered to Sacl. This 
cycle enriches phosphatidylserine at the 
plasma membrane (Fig. 1). The hydrolysis 
of PI4P is a crucial ingredient of this mecha- 
nism — the LTPs mediated phosphatidylserine 
traffic only if they were capable of binding PI4P 
too, and, in yeast, Sacl activity was required for 
phosphatidylserine traffic. 

These results will undoubtedly inspire the 
search for other non-PI4P ligands of proteins 
related to OSBP, possibly shedding light on 
the asymmetric transport of many different 
lipids. Although this is an exciting prospect, 
the papers raise several questions. First, does 
Sacl work exactly as described? Although 
both papers propose that Sacl acts after 
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delivery of PI4P to the ER, other evidence”? 
suggests that it acts on PI4P at the plasma 
membrane. 

Second, is a PI4P countercurrent the only 
way to transport and enrich lipids at the 
plasma membrane, Golgi and endosomes? 
Probably not. Cholesterol and phosphati- 
dylserine might be transported by other 
means’, becoming enriched in the plasma 
membrane through a trapping mechanism. 
Indeed, the possibility that more than one 
mechanism might perform this task is sup- 
ported by the fact that yeast cells lacking 
Osh6 and Osh7 show no defects’. Perhaps 
the otherwise minor contribution of vesicular 
transport is increased in these mutants. This 
could be tested by disabling secretory trans- 
port in cells lacking Osh6 and Osh7 or ORP5 
and ORP8. 

Most data on LTPs have been gathered from 
in vitro experiments, and so the in vivo role of 
these proteins remains enigmatic. Until tech- 
niques are developed to determine whether 
the vast bulk of trafficked lipid molecules are 
solubilized by an LTP, it remains possible that 
in vitro lipid transfer by LTPs is not matched 
by the same activity in cells. Instead, LTPs may 
merely sense lipids, binding them when they 
are abundant to activate downstream targets 
that then mediate traffic'’. Nonetheless, the 
fact that identically arranged countercurrents 
pervade not only the ORD-containing fam- 
ily'**' but also the Secl4 LTP family’* is 
strong evidence that some LTPs can transfer 
lipids in bulk. = 
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NEURODEGENERATION 


Amyloid-f pathology 
induced in humans 


People who died of the neurodegenerative condition Creutzfeldt- Jakob disease 
after treatment with cadaver- derived human growth hormone also developed 
some of the pathological traits of Alzheimer’s disease. SEE LETTER P.247 


MATHIAS JUCKER & LARY C. WALKER 


that a rare but deadly human degenerative 
brain disorder called Creutzfeldt—Jakob 
disease (CJD) could be transmitted experi- 
mentally to animals and, under unusual 
circumstances, to other humans’”. Since then, 
some have speculated that other neurodegen- 
erative diseases might also be transmissible”. 
On page 247 of this issue, Jaunmuktane et al.’ 
present evidence indicating that changes in the 
brain that are characteristic of Alzheimer’s dis- 
ease have been transmitted between humans. 
Transmission probably occurred through injec- 
tions of contaminated, cadaver-derived human 
growth hormone (c-hGH) that was extracted 
from pituitary glands collected at autopsy. 
Before 1985, an estimated 30,000 people — 
mostly children with growth deficiency — 
received injections of c-hGH (refs 2, 5). To 
obtain sufficient quantities of hormone for 
treatment, thousands of pituitary glands 
(a tissue found at the base of the brain) were 
pooled and homogenized, and c-hGH was 
then chemically extracted (Fig. 1). After dis- 
ease incubation times ranging from 5 to more 
than 40 years, a small percentage of these 
people (up to 6.3%, according to country’) 
developed CJD. We now know that the CJD- 
causing contaminant in the pituitary extracts 
was the prion, a normally produced protein 
that becomes infectious and toxic by adopt- 
ing an abnormal shape that similarly corrupts 
other prion proteins. 


[: the 1960s and ’70s, researchers discovered 


Another disease of protein misfolding — 
and the most prevalent form of dementia — is 
Alzheimer’s disease®. The pathological hall- 
marks of the disease are insoluble aggregates of 
amyloid-f protein (AB) called plaques, which 
form between neurons; AB build-up in the 
blood vessels of the brain; and the abnormal 
deposition of tau protein in nerve cells (known 
as tauopathy). Several lines of evidence indi- 
cate that the misfolding and accumulation of 
Af is an early driver of Alzheimer’s disease, 
and that this process precedes the onset of 
dementia by well over a decade®. But whether 
every person with extensive brain AB depo- 
sition will ultimately develop Alzheimer’s 
disease is a focus of current research. 

It is known that Af can aggregate in the 
brains of animals if their brains are injected 
with minute amounts of misfolded AB pro- 
teins known as seeds’. This indicates that AB 
deposition can be induced through a prion- 
like mechanism of corruptive protein templat- 
ing’. By identifying a similar phenomenon in 
humans, Jaunmuktane and colleagues’ study 
provides fresh support for this seeding concept 
ina clinically relevant setting. 

The authors describe the findings of autop- 
sies on 8 people who died of CJD at between 
36 and 51 years of age, having been treated with 
c-hGH approximately 30 years earlier. In addi- 
tion to the neurodegenerative changes typical 
of CJD, four of the subjects showed extensive 
Af deposition in the brain and two had sparse AB 
deposits. Such Alzheimer’s-like changes are 
extremely rare at such a young age, and were 
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Figure 1 | Contamination of growth-hormone extracts. Before 1985, people 
in need of growth-hormone treatment were treated with cadaver-derived 
human growth hormone (c-hGH). To prepare c-hGH, the pituitary gland at the 
base of the brain was extracted at autopsy. Of the thousands of glands extracted, 
a few contained prions from people with the neurodegenerative condition 
Creutzfeldt-Jakob disease (CJD). Jaunmuktane et al.’ report that some of the 
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not found in patients up to a decade older who 
died of prion diseases that were unrelated to 
c-hGH treatment. The authors also showed that 
the c-hGH-treated subjects did not have any of 
the known genetic risk factors for Alzheimer’s 
disease. Moreover, they confirmed a previous 
report*® that AB deposits occur in the pituitary 
glands of people with Alzheimer’s disease, 
supporting the possibility that aggregates were 
induced by Af seeds in the c-hGH. 

Although an observational study such as 
this cannot prove that the AB deposits in the 
patients’ brains were caused by Af seeds, 
studies in genetically modified mice have 
established that aggregated AB can behave 
like prions”. Strikingly, when AB seeds were 
introduced into the abdomens of mice, rather 
than directly into the brain, AB deposition was 
more prominent in cerebral blood vessels than 
in Af plaques”’. This finding mirrors the vascu- 
lar AB accumulation observed by Jaunmuktane 
et al., and reinforces the supposition that the 
Af seeds in the affected people travelled to the 
brain from elsewhere in the body. 

How can future experiments strengthen the 
case for the prion-like seeding of Af in humans 
and better assess its implications? The original 
c-hGH extracts, if available, should be assessed 
for the presence of AB seeds using biochemi- 
cal and animal-transmission experiments. 
Although age-matched control patients who 
died of prion disease had a much lower inci- 
dence of Af deposits than did the patients who 
died of CJD following c-hGH treatment, there 
remains a possibility that CJD itself can pre- 
cipitate Alzheimer’s-like pathology’’. Under- 
standing the mechanisms by which these 
different disease processes interact in the brain 
could help to explain the frequent coexistence 
of multiple degenerative brain diseases 
in the elderly’. 

Af seeds are long-lived in the brain, and 
may be even more resistant to degradation 
than are prions’. Given the build-up of Af in 
the pituitary glands of people with Alzheimer’s 
disease, and the relatively high prevalence of 
the disease in the general population, batches 
of c-hGH are more likely to have been contam- 
inated by A seeds than by prions, which could 
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glands probably also contained seeds of amyloid-f protein (AB), possibly from 
people with Alzheimer’s disease. The pooled glands were homogenized and 
the c-hGH was then extracted and injected into patients. After approximately 
30 years, some recipients died of CJD, owing to a build-up of prions. The 
authors show that some of these people also had Af deposits in the brain, 
suggestive of incipient Alzheimer’s disease. 
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mean that more recipients received injections 
containing AB seeds. However, it is important 
to stress that the subjects of this study died 
of CJD, not of Alzheimer’s disease. Whether 
those with AB lesions would eventually have 
manifested clinical Alzheimer’s disease cannot 
be known with certainty. 

Continued surveillance of surviving 
c-hGH recipients will be essential to deter- 
mine whether they are at unusually high risk 
of developing Alzheimer’s disease. An earlier 
study® suggests that, as of 2008, c-hGH-treated 
patients in the United States are not more 
likely to develop Alzheimer’s disease than 
people in the general population, although an 
incubation period of 30 years or more is pos- 
sible. Interestingly, the subjects in the current 
study lacked tauopathy, an essential feature 
of Alzheimer’s disease*. Whether tauopathy 
would have emerged over a longer incubation 
period is unknown. 

This transmission of AB pathology occurred 
in the uncommon context of long-term 
c-hGH therapy. So far, there is no indication 
that Alzheimer’s disease can be transmitted 
between people under ordinary circumstances. 
Furthermore, the replacement of c-hGH by 
genetically engineered growth hormone has 
eliminated the risk that growth-hormone 
treatment will inadvertently transmit brain 
disorders between humans. However, it is 
conceivable that the human transmission of AB 
seeds can occur under other conditions, which 
must now be carefully defined. Jaunmuktane 
and colleagues’ findings should stimulate new 
research in this direction, and, more gener- 
ally, will inspire further investigation into the 
mechanisms that govern the formation, trans- 
missibility and toxicity of misfolded protein 
seeds in neurodegenerative diseases. m 
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ATMOSPHERIC SCIENCE 


Sea-spray particles 
cause freezing in clouds 


Ice clouds in marine regions at high latitudes might form in warmer and drier air 
than was previously believed because of freezing induced by airborne particles 
that contain organic materials from ocean surface waters. SEE LETTER P.234 


LYNN M. RUSSELL 


he oceans cover two-thirds of Earth’s 

surface and are almost entirely, and 

rather uniformly, composed of water 
and inorganic salts’. The remaining fraction 
of a per cent of ocean water contains organic 
material. This has a variable concentration in 
space and time’ and is largely uncharacterized, 
but might be a key component in driving ice 
formation in the atmosphere. On page 234 of 
this issue, Wilson et al.* report that organic 
material concentrated in the topmost milli- 
metres of the ocean has the essential crystal- 
forming properties needed to freeze water and 
form ice clouds in the atmosphere — a process 
called ice nucleation. The findings might help 
to refine predictions of future climate. 

Ice formation in clouds is central to precipi- 
tation processes because it affects whether, 
when and where rain, snow or ice falls out of 
clouds. Climate models calculate the timing 


Temperature (°C) 


and location of ice clouds and the associated 
precipitation partly on the basis of the particle 
types and concentrations that are thought to 
be present in the atmosphere. For example, air 
temperature must drop to almost —40°C, and 
the humidity relative to that at which ice can 
form at that temperature must be well above 
100%, for water to freeze in the atmosphere 
when no ice-nucleating particles are pre- 
sent*”. But different types of particle can pro- 
mote freezing when the air is not as cold or as 
humid as that — by contact with, or immersion 
in, supercooled water droplets (that is, liquid 
droplets cooled to below the ideal freezing 
temperature), by condensation of water onto 
particles or by direct deposition of ice from 
water vapour on the particles (Fig. 1). 

Wilson et al. provide evidence that marine 
particles could support ice-cloud formation at 
locations (or at times of the year) where dust is 
too sparse to freeze ice efficiently. To do this, 
they sampled surface seawater using a variety of 


a 


10,000 Homogeneous wey 
‘ freezing 
« oe a 
level ice 
clouds 
Supercooled immersion wa IU 
solution drops i & 
o Deposition i) 
a~ — 
8 : Roy 
3 5,000|-20| Ice-nucleating Water vapour Se 
£ ba wee Mid-level clouds 
£ Condensation x 
=} OO 
<x 
2 Supercooled Ice crystal 
2,000 water drop oe = 
oF 
@ Low-level clouds 
(6) 
100 160 


Relative humidity (%) 


Figure 1 | Ice formation in clouds. The predominant processes for ice formation in the atmosphere 
depend on temperature (which changes with altitude) and the relative humidity with respect to that at 
which ice can form. In low-level mixed-phase clouds (composed of water droplets and some ice particles), 
freezing may occur most effectively when supercooled water droplets come into contact with ice-nucleating 
particles (INPs). In mid-level mixed-phase and ice clouds, water vapour condenses on INPs, or INPs 
become immersed in water droplets, after which ice crystals form. Ice crystals can also form when INPs 

are immersed in supercooled drops of solutions (of salts or of organic compounds, for example), or by 
direct deposition of ice on the particles. High-level ice clouds include ice that forms ‘homogeneously’ when 
supercooled droplets freeze or water vapour crystallizes in the absence of INPs. Wilson et al.” report that 
particles from the ocean surface can act as INPs. (Figure adapted from refs 4 and 5.) 
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techniques, used X-ray microscopy to chemically 
characterize organic material in the water, and 
observed droplet freezing (both in situ and in 
samples returned to the laboratory). 

Bubbles bursting at the ocean surface incor- 
porate some of this surface-ocean organic 
material into particles that are lofted into the 
atmosphere, and these particles may have a 
larger role in forming ice clouds than was pre- 
viously calculated in climate models. Indeed, 
Wilson and colleagues show that, when the 
measured ice-forming abilities of organic 
materials are represented in a model’ that 
calculates the effects of sea-spray particles in 
global atmospheric simulations, marine parti- 
cles contribute more to ice nucleation in high- 
latitude regions, where airborne dust is sparse, 
than was previously thought. If these results 
are representative of airborne marine-derived 
particles around the world, the occurrence of 
ice clouds in climate simulations could change 
substantially. The authors’ models suggest that 
the changes will be most evident at high lati- 
tudes that have few continents and little desert 
area, such as the northern Pacific and Atlantic 
oceans and the Southern Ocean. 

Because few measurements of ice-nucleating 
properties have been taken from ocean surface 
layers, the model used by the authors neces- 
sarily extrapolates the global picture from a 
limited number of samples in the surface 
waters of the Arctic and the northern Pacific 
and Atlantic oceans. To refine things further, 
it will be necessary to determine the degree to 
which organic particles from surface waters of, 
for example, the Southern Ocean differ from 
marine particles sampled at other latitudes. 
The simulations could also be improved by 
characterizing the seasonal and biogeochemi- 
cal drivers that change the freezing properties 
of marine organic material and the particles 
that it forms. Longer-term observations are 
needed to assess how year-to-year variability 
in weather and in ocean-nutrient availability 
affects the formation of organic material that 
induces freezing. 

Wilson and co-workers’ findings could 
also have implications for our understand- 
ing of how climate will change in the com- 
ing decades. For instance, as global warming 
occurs, ice clouds might form less frequently 
in warmer air near the ocean’s surface, but 
stronger surface winds could produce more 
marine particles to initiate freezing. These 
two effects may cancel out each other. But if 
phytoplankton populations decline, then fewer 
organic ice-freezing particles could be formed, 
which would exacerbate the reduction in ice- 
cloud formation. 

The authors’ work also reveals that marine- 
derived particles containing organic material 
were part of the natural mixture of atmos- 
pheric particles that made ice freeze in pre- 
industrial times, but further work is needed to 
address fundamental questions about marine 
particles in general: how many of them form, 


and what fraction contains ice-freezing organic 
material? And how do surface winds, ocean 
ecosystems and the state of the sea change both 
of these quantities? 

Finally, little is known about what controls 
the size and composition of particles formed as 
bubbles burst at the ocean surface, but under- 
standing the basic physical processes involved 
is crucial. Limited measurements and semi- 
empirical parameterizations provide only a 
rough basis for climate models to calculate 
the distribution of such particles in the atmos- 
phere. Satellite observations provide some 
constraints on the present-day distributions 
of airborne particles, but without an under- 
standing of the mechanisms of ocean-particle 
formation, the accuracy and certainty of future 
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and past contributions from marine particles to 
changing climate will continue to be limited. m 
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Glimpse into a primitive 
Stellar nursery 


The first well-resolved images of local-galaxy stellar nurseries that are poor in 
elements heavier than helium give the best picture yet of the conditions in which 
stars may have formed in the early Universe. SEE LETTER P.218 


ADAM LEROY 


stronomers dub elements heavier than 

hydrogen and helium ‘metals, and 

these make up only trace amounts of 
all matter by mass. For example, about 2% of 
the interstellar matter in the neighbourhood 
of the Sun comprises metals, the most abun- 
dant of which are oxygen and carbon. These 
elements have a role in catalysing the birth of 
stars that is far out of proportion to their low 
abundance. On page 218, Rubio et al.’ present 
the first well-resolved pictures of metal-defi- 
cient stellar nurseries found in a nearby dwarf 
galaxy, by recording the spectral lines emitted 
by carbon monoxide (CO). The results open 
up CO spectroscopic imaging as a diagnostic 
for exploring the relationship between metal 
content and star formation for substantially 
metal-deficient systems. 

Stars form out of cold, dense clouds of 
molecular hydrogen (H,). In these clouds, 
metals act as coolants, helping the gas to reach 
low temperatures and facilitating its collapse 
into pre-stellar condensations. Metals also 
form interstellar dust, which shields stellar 
nurseries from starlight that would otherwise 
break molecules apart and heat the gas. 

These metals are produced in stellar 
interiors. When stars die and explode, some 
of the newly produced metals are mixed into 
the interstellar gas. Thus, successive gen- 
erations of stellar birth and death lead to a 


gradual enrichment of heavy elements in the 
interstellar medium. These, in turn, aid the 
subsequent formation of new stars. Follow- 
ing this logic backwards, early generations of 
stars probably formed in stellar nurseries that 
contained few metals compared with the Milky 
Way or similar present-day galaxies. Thus, to 
understand the build-up of the first stars and 
galaxies, astronomers must measure how a 
dearth of metals (low metallicity) affects the 
star-formation process. 

To study star formation in metal-poor gas, 
astronomers study the least-massive galaxies 
in the present-day Universe. These dwarf gal- 
axies are not believed to be truly young, and 
so they are imperfect analogues of distant pri- 
mordial systems. But because ofa combination 
of their inefficient star-formation activity and 
weak gravity (exploding stars can blow heavy 
elements entirely out of a small galaxy), they 
are deficient in heavy elements. Therefore, 
researchers use them as local ‘laboratories’ 
to investigate how a lack of metals affects the 
formation of stars in interstellar gas clouds. 

Direct observation of the H, that makes up 
most of the cold, dense gas in these clouds is 
difficult. This forces astronomers to observe 
molecular tracers that are mixed with the 
H,, and whose spectral signatures are used 
to infer the abundance of H, indirectly. The 
workhorse tracer is CO, the second most 
abundant interstellar molecule’. CO survives 
in the interstellar medium mainly in regions 
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Figure 1 | The dwarf galaxy Wolf-Lundmark-Melotte (WLM). Rubio et al.’ report that the dense clouds 


containing carbon monoxide in this galaxy (not visible here) have similar physical properties to such 
clouds in the Milky Way, despite the galaxy’s low abundance of elements heavier then helium. The clouds 
reside inside extended stellar nurseries of hard-to-see cold gas. In this composite image of WLM, young 
massive stars born in these clouds emit most of the ultraviolet light (blue); optical light (green) is emitted by 
all stars. The large reservoir of warm, not yet star-forming, interstellar gas emits in radio wavelengths (red). 


where there is enough dust to shield it from 
starlight, and it is studied through its milli- 
metre-wavelength emission, which is detect- 
able by radio telescopes. In a galaxy such as the 
Milky Way, dust is plentiful and CO is fairly 
well mixed with H,. However, the dust (and the 
CO itself) is made of heavy elements. Emission 
from CO therefore tends to be faint in galaxies 
of low metal content. 

There is along history of hunting for CO in 
dwarf galaxies, with the goal of understand- 
ing metal-poor stellar nurseries. For decades, 
the Small Magellanic Cloud (a dwarf satellite 
galaxy of the Milky Way) anda few similar gal- 
axies remained the most metal-poor systems 
in which stellar nurseries had been detected 
by their CO emissions. A barrier of about one- 
fifth of the Milky Way’s metallicity emerged’ 
as a practical limit to the detection of CO, and 
direct knowledge of stellar nurseries in galaxies 
below this limiting value was largely lacking. 

Two years ago, researchers from the same 
group as Rubio et al. used the Atacama Path- 
finder Experiment telescope in Chile to push 
past this ‘metal barrier’. They observed CO 
emission from the Wolf-Lundmark-—Melotte 
(WLM) dwarf galaxy, which is part of the 
same Local Group of galaxies as the Milky Way 


(Fig. 1). WLM has a metal content’ only about 
one-tenth that of the Milky Way, and about 
half that of the previous record holder® from 
which CO had been detected. The researchers 
showed that the CO emission from star-form- 
ing regions in WLM was faint compared with 
that from other tracers of gas and star-forma- 
tion activity. This implied that CO molecules 
in WLM traced only the densest, most opaque 
parts of an extended stellar nursery. 

Rubio et al. have now used the Atacama 
Large Millimeter/submillimeter Array 
(ALMA), the world’s most powerful milli- 
metre-wavelength telescope, to take well- 
resolved pictures of these regions in WLM. 
The authors’ images of CO emission reveal 
that the nurseries are confined to stunningly 
small clumps that presumably represent only 
the densest parts of the star-forming gas clouds 
(see Fig. 1 of Rubio and co-workers’ paper’). By 
contrast, CO emission pervades star-forming 
regions of the Milky Way, such as the Orion 
molecular cloud”*. The images enabled the 
authors to directly measure the CO-emitting 
clouds’ sizes (about 3 parsecs across). They also 
measured the clouds’ kinetic energies, because 
radio telescopes can track the motion of CO 
gas by measuring the shift of the frequencies 
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of its emission lines relative to their rest val- 
ues (the Doppler effect). On the basis of these 
measurements, Rubio et al. argue that the phys- 
ical properties (density, pressure and self-grav- 
ity) of the CO-containing clouds in the WLM 
galaxy do resemble those of similarly sized 
clumps present in metal-rich locales such as 
the Sun’s neighbourhood — even though most 
of a given star-forming cloud in WLM is invis- 
ible in CO emission. 

The authors argue that this similarity in 
physical properties helps to explain why star 
clusters born in metal-poor galaxies resem- 
ble those seen in less-extreme systems. In 
effect, they propose that the main impact of 
WLM'’s lack of metals is to render the bulk of 
the cloud difficult to see using CO. The lack 
of dust in WLM means that our best tracer 
of H, is present only deep in the cloud, and 
the behaviour of most of the H, is perhaps not 
so different from that in other ‘normal gal- 
axies. This agrees, at least qualitatively, with 
simulations and theoretical predictions for 
the behaviour of CO and H, in metal-poor 
galaxies’. The authors further speculate that 
the small size of these dust-enshrouded, CO- 
emitting clumps may explain the relative pau- 
city of highly massive stellar clusters in small, 
isolated galaxies. 

The current study highlights a changing 
approach to studying star formation in low- 
metallicity systems. Modern telescopes have 
begun measuring the energetics, densities and 
turbulent character of metal-deficient stellar 
nurseries. This is a substantial advance on 
simply hunting for faint CO emission from 
such systems. ALMA is now operating full- 
time, so we could see exciting progress in this 
field in the coming years. 

However, the fundamental problem of 
knowing how much H, gas is present in metal- 
poor systems remains daunting, especially 
given this striking demonstration that CO 
inhabits only small, dense pockets of gas in the 
interiors of extended stellar nurseries. High- 
resolution observations of other gas tracers, 
including ionized and neutral carbon, and of 
dust (studied through its infrared emission and 
the attenuation of starlight that it causes) will 
be needed to piece together the structure of 
metal-poor clouds in detail. Rubio et al. have 
investigated these other tracers, but only at 
poor resolution that is not well matched to the 
tiny CO-emitting clouds found in WLM. 

Finally, it should be noted that only a hand- 
ful of clouds have been measured in a single 
system, but the star-formation process can be 
violent and random on small spatial scales. It 
will be fascinating to see if these first results 
are indeed indicative of a broader population 
of clouds in other low-metallicity galaxies. m 
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How to catch rare 


cell types 


The development of an algorithm called RaceID enables the identification of rare 
cell types by single-cell RNA sequencing, even when they are part of a complex 


mixture of similar cells. SEE LETTER P.251 


LU WEN & FUCHOU TANG 


ow many cell types are there in the 
H human body? Thanks to progress in 

single-cell sequencing technologies, 
scientists are now addressing this question in 
a systematic and non-biased way. On page 251 
of this issue, Griin et al.’ take this research for- 
ward another step, describing an algorithm 
called RaceID that can identify rare cell types 
in acomplex mixture of cells. 

The transcriptome is the complete collection 
of RNA molecules present in a cell. Standard 
approaches to sequencing these assemblages 
provide an average view of the transcriptome 
across many cells, and so cannot provide infor- 
mation about differences between individual 
cells (heterogeneity), or about the character- 
istics of rare cell types within a heterogeneous 
population. Such analyses require single-cell 
transcriptome-sequencing technologies” *, and 
in the past few years it has become possible to 
acquire transcriptome data for hundreds and 
even thousands of single cells*°. Questions 
have been raised, however, about how reli- 
ably information can be mined from these 
huge data sets, particularly given that they 


produce considerable technical noise’ owing to 
inaccuracies in the techniques used. 

The epithelial cells that line the intestine 
absorb nutrients and defend the body against 
microorganisms. The epithelium contains 
six mature cell types, which are continually 
renewed bya small population of adult stem 
cells*. This cell layer is one of the best-studied 
models of self-renewal and differentiation in 
adult stem cells, and many markers of distinct 
epithelial cell types have been character- 
ized. This makes it an invaluable system for 
developing techniques and algorithms 
for single-cell analysis. 

Grin et al. used a single-cell transcriptome- 
sequencing technique to analyse 238 epithelial 
cells obtained from mouse intestinal organoids 
— ‘mini guts’ grown in vitro froma single stem 
cell that contain every cell lineage of the intesti- 
nal epithelium. Using standard clustering algo- 
rithms, the authors distinguished three major 
cell populations (a rapidly dividing precursor 
population called transit amplifying cells, 
absorptive cells called enterocytes and secre- 
tory cells). One algorithm, K-means clustering, 
could distinguish several subgroups within the 
abundant enterocyte cell population, including 
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early and late progenitors and mature cells. 
However, none of the algorithms could distin- 
guish subgroups within the rare secretory-cell 
lineage, which was represented by only 20 cells 
in the sample. 

The secretory-cell lineage contains at least 
three cell types, one of which — the hormone- 
producing enteroendocrine cells — can be 
further divided into more than ten different 
subtypes according to the hormones that they 
secrete’. Enteroendocrine cells have key roles 
in maintaining gut homeostasis, and so an 
ability to distinguish the different subgroups 
is desirable. But because of the similarity of 
their transcriptomes, the subgroups could not 
be discriminated by standard algorithms in the 
authors’ initial analyses. 

To get around this limitation, Griin et al. 
developed RaceID, a simple and clever algo- 
rithm that clearly distinguishes different secre- 
tory cell types. The algorithm assumes that a 
given cell type is likely to strongly express a 
certain number of cell-type-specific ‘outlier’ 
genes. Such genes can be identified if care is 
taken to exclude technical and biological noise 
(biological noise arises from differences in 
transcript expression between individual cells 
of the same type). RaceID identifies outlier cells 
in each cluster after a K-means clustering step. 
An outlier cell is defined as expressing a certain 
number of outlier genes at levels significantly 
exceeding the modelled noise. In this way, iden- 
tification ofa cell type will not depend on global 
cell-cell differences, as in standard clustering 
algorithms, but on only a few genes (Fig. 1). 

During single-cell sequencing, each RNA 
transcript must be amplified many times to 
provide enough material for accurate sequenc- 
ing. But the amplification step can introduce 
technical noise, because small errors in meas- 
uring the number of transcripts produced 
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Figure 1 | Race to identify rare cells. a, To attempt to differentiate cell types 
in a mixed population, standard clustering algorithms such as K-means 
clustering analyse all the RNA transcripts in each cell to determine global 
differences in gene expression. In this analysis, variations in gene expression 
between cells are reduced to a two-dimensional space. However, clustering 
algorithms often fail to identify which cells are of a rare type (oval). b, Grin 


Cells 


et al.' have developed an algorithm, RaceID, that detects outlier genes that are 
expressed in a cell at a level significantly higher than a given threshold, which 
is based on the amount of both technical and biological noise. Combining a 
K-means clustering step with data on which cells express a certain number 

of outliers above the threshold level identifies cut-off points that enable the 
identification of rare cell types. 
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from each gene in a cell are magnified during 
replication. The authors exclude this noise 
using a previously reported technique’’ to 
add a unique molecular ‘barcode’ to each 
individual transcript before amplification. 
This enables the RaceID algorithm to deter- 
mine whether high levels of gene expression 
are real or an artefact of amplification. Griin 
and colleagues demonstrated the efficiency 
of this strategy using a pool-and-split experi- 
ment. They pooled transcripts from 93 cells, 
split the RNA into 93 equal samples, which 
created 93 ‘average’ single cells, then amplified 
and sequenced each sample separately; and no 
false positive rare cell types were detected. 

RacelD identified the gene Reg4 as being 
highly expressed specifically in enteroendo- 
crine cells. Griin et al. isolated and sequenced a 
population of 161 Reg4-expressing cells. Using 
RacelD, they identified new enteroendocrine 
subtypes and validated them in vivo at the level 
of both RNA and protein. This confirmed that 
RaceID can be used for the identification of 
rare cell types. 

There has been much debate about whether 
the intestinal stem-cell population is hetero- 
geneous. Can RacelD find subtypes in this 
population, which is marked by expression of 
the gene Lgr5? Griin and colleagues sequenced 
transcriptomes from 288 Lgr5-expressing cells. 
RaceID identified these cells as largely homo- 
geneous — the stem-cell population — mixed 
with a population of rare Lgr5-expressing 
secretory cells. However, as the authors note, 
it remains possible that the stem-cell popula- 
tion is heterogeneous, but that differences are 
below a level detectable even by RaceID. 

The major limiting factor for RaceID is the 
accuracy of single-cell sequencing. It is still not 
possible to measure low-level gene expression 
accurately in a single cell, and the technical 
noise for detection of such genes will be too 
high to identify outliers. The genes for the tran- 
scription factors that determine a given cell type 
are generally not expressed as highly as those 
encoding hormones, for instance. This might 
prevent RaceID from discerning potentially 
functionally important rare cell types in which 
the differentially expressed genes are likely to 
mainly encode transcription factors, and may 
explain the fact that Griin et al. were unable to 
detect stem cells in the initial organoid analysis, 
because the cells express Lgr5 at low levels. 

The potential for falsely ‘identifying new 
rare cell types should also be considered. Care 
must be taken to avoid nucleic-acid cross-con- 
tamination or incomplete cell dissociation. It 
will be necessary to validate putative cell types 
at the RNA and even protein level. 

In terms of sensitivity, accuracy and 
comprehensiveness, current single-cell 
sequencing techniques and bioinformatics 
tools are far from perfect. This is particularly 
true when it comes to discovering rare cell 
types. But through the unremitting efforts of 
Griin et al. and others, in the near future we 


may be able to chart a complete cell-lineage 
map of the human body. = 
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A moving target 


An in silico, three-dimensional model of tumour evolution suggests that cell 
motility is a key factor in the initial growth of a tumour mass. The model also 
reveals the dynamics of mutation spread. SEE LETTER P.261 


NATALIA L. KOMAROVA 


volutionary thinking is becoming an 
Pinisoenstt tool to understand cancer, 
and even to propose directions in the 
search for treatment strategies. In this issue, 
Waclaw et al.' (page 261) use mathematical 
modelling based on evolutionary principles 
to provide an explanation for the observed 
architecture of tumours, and to argue that 
cell migration might be the key to tumour 
shape, spread and drug resistance. This 
study opens up the possibility of treatments 
that target genes related to cell motility and 
adhesion, rather than the conventional targets 
of genes governing cell division, death and 
differentiation. 
Cancer is an unwanted evolutionary process 
whereby cells, driven by random mutations, 


a_ Nocell motility 


escape the orchestrated behaviour of a 
functioning tissue and enter a phase of abnor- 
mal growth and, later, metastasis (tumour 
spread). We still lack understanding of many 
aspects of this complex process, and research- 
ers in different fields are collaborating to solve 
this ultimate riddle. Evolutionary biologists 
approach the study of tumours in a similar 
manner to the study of viruses, bacteria or ani- 
mals: they seek the mutations that give rise to 
the ever-changing variety of tumour cells, and 
they look at the forces of natural selection that 
allow certain mutants to proliferate, replace 
their competitors and give rise to new waves 
of evolutionary change. 

Waclaw et al. combined methods from 
evolutionary biology and ecology with current 
knowledge of the molecular biology of can- 
cer to design a versatile mathematical model 


b Cell motility 


Figure 1 | Motility helps to explain tumour dynamics. Waclaw et al.’ present an in silico model of 3D 
tumour architecture over the course of tumour evolution. a, When cells in the model are unable to move, 
a slow-growing tumour of largely spherical shape is predicted. b, By contrast, motile cells lead to a faster- 
growing tumour with a conglomerate structure more similar to that seen in clinical examples. 
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in which tumour-cell populations undergo 
evolutionary change, guided by realistic 
parameters. The authors used this model to 
study the growth laws of 3D in silico tumours, 
focusing on the tumours’ geometry and cel- 
lular composition. If the tumour cells are rela- 
tively immobile, then, as they proliferate and 
form a malignant mass, they crowd each other 
out and thus slow their own replication. Soon 
the tumour can grow only at its surface, which 
results in a relatively slow expansion (the mass 
grows as a cubic power of time). 

But this slow growth cannot account for the 
relatively fast tumour expansion observed in 
many clinical studies. The resolution of this 
apparent paradox lies in cellular motility. 
By giving each cell in the model the ability 
to migrate, the researchers observed a much 
faster, exponential, growth, which also yielded 
a different, more realistic, tumour shape 
(Fig. 1). This result is consistent with the earlier 
proposition’ that migratory potential is acom- 
ponent ofa cell’s evolutionary fitness, in much 
the same way as is its replicative potential. 
However, it was previously thought that cell 
migration was mostly involved in the invasion 
of tissues by tumours or in metastasis. The 
direct, pivotal role of cell motility in tumour 
growth was under-appreciated and can now be 
considered a valid treatment target. 

Another focus of the authors’ study was 
tumour composition. In particular, they 
asked how quickly a particular mutation 
can propagate in a mass of cancer cells, thus 
changing the tumour’s properties. Evolution- 
ary processes and their outcomes are largely 
shaped by the environment in which they 
take place. For example, evolution in a well- 
mixed, homogeneous medium takes place at 
a different pace from evolution in an environ- 
ment in which interactions are restricted by 
geometric constraints. And in the latter case, 
dimensionality is key. For example, it has 
been shown that inactivation of a tumour- 
suppressor gene (a two-hit evolutionary pro- 
cess in which the cells must first become less 
fit before becoming more fit) happens faster 
in 1D (a row of cells)** than in 2D (a layer), 
and this is in turn faster than in a fully mixed 
system with no spatial constraints*”’. By 
contrast, in two-step processes in which the 
intermediate mutant confers a slight selective 
advantage, the relationship is the opposite, 
and a non-spatial, fully mixed environment 
promotes the fastest pace of evolution’. These 
phenomena seem less surprising if one notes 
how reminiscent they are of other fundamental 
laws of nature in which space dimensionality 
changes how things work, such as the different 
fundamental solutions of Poisson’s equations 
in 1D and 2D. 

Waclaw et al. then set out to understand 
why, given the high overall degree of tumour 
heterogeneity, some mutations are so prevalent 
among the cells of a given tumour. In the con- 
text of tumour progression, two broad classes 


of mutation have been identified®. Cells with 
driver mutations are characterized by having 
a growth advantage over other cells, and such 
mutations are thought to be responsible for 
cancer initiation and progression. Passenger 
mutations are genomic changes that do not 
really alter the cells’ growth properties, and 
do not have a causal role in cancer origin or 
progression. Waclaw and colleagues show 
that, in the presence of even a small amount 
of selective advantage (that is, a driver muta- 
tion), the affected cells sweep rapidly through a 
3D cell population. This explains the observed 
composition of large tumours, in which almost 
every cell contains the same driver mutations, 
and heterogeneity resulting from passenger 
mutations accumulates later, during tumour 
progression. 

This idea is crucial in the context of cancer 
therapy. A mutation that confers resistance to 
a drug is usually a passenger mutation before 
treatment; such mutations are ‘hiding’ inside 
any tumour and are generated simply by 
chance as a result of the constant background 
mutation rate. But resistant mutants immedi- 
ately gain a selective advantage once treatment 
is applied. Waclaw and colleagues’ paper illus- 
trates how rapidly resistant cells can accumu- 
late, leading to regrowth and treatment failure. 
This happens even faster in the presence of 
mutations that increase cellular motility. 
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How far are we from being able to use 
evolution to our advantage? Understanding 
evolution’s intricate ways brings us a step closer 
to being able to reverse malignant processes, 
and to channel the dynamics in the direction 
we want. And can we use the genes responsi- 
ble for cell motility or cell adhesion as targets 
in future cancer treatments? Waclaw and col- 
leagues’ theoretical study suggests that this is a 
possibility, and it is to be hoped that others will 
take up the challenge. = 
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Mutant p53 and 
chromatin regulation 


The finding that genes encoding enzymes that modify histone proteins are 
among the targets of certain mutant forms of the p53 protein sheds light on how 
these mutations cause cancer beyond p53 inactivation. SEE ARTICLE P.206 


CAROL PRIVES & SCOTT W. LOWE 


utations in the TP53 tumour- 
suppressor gene are common in 
human tumours. Although these 


mutations invariably inactivate the normal 
activity of p53 (ref. 1), which is the transcrip- 
tion factor encoded by TP53, some mutations 
also endow p53 with ‘gain-of-functior activi- 
ties that promote cancer’. Whether diverse 
p53 mutants produce similar gain-of-func- 
tion activities and how they do so remains a 
puzzle, but finding the answer might enable 
the design of strategies for treating many 
cancers. On page 206 of this issue, Zhu et al? 
provide a possible explanation: they show that 
gain-of-function mutant p53 proteins induce 
the production of enzymes that modify the his- 
tone proteins around which DNA is packaged 


as chromatin, thus altering gene expression. 
Experimentally altering the expression 
of gain-of-function mutant p53 affects the 
expression of myriad genes, enhancing the 
invasiveness and proliferation of tumour 
cells in vitro’. Moreover, mice harbouring key 
gain-of-function mutations in TP53 develop 
tumours that differ from those lacking p53 
(ref. 5). Lowering the level of gain-of-func- 
tion p53 has antiproliferative effects in vitro 
and can reduce metastasis or trigger tumour 
regression in vivo~”*. A better understanding 
of these mutants is therefore desirable. Zhu 
et al. found that, in cultured human-cancer 
cell lines, gain-of-function mutant forms of 
p53 bind to different regions of DNA from 
the normal protein. In particular, the mutant 
proteins bind to the genes MLLI and MLL2. 
Gain-of-function p53 seems to be recruited to 
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Mutant p53 
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Figure 1 | Gaining on p53. ‘Gain-of-function mutations in the tumour-suppressor gene TP53 enable 
the transcription factor that it encodes to bind to abnormal targets, leading to cancer. Zhu et al.* report 
that gain-of-function p53 binds to the transcription factor ETS2 and activates the genes MLL1, MLL2 
and MOZ. MLL1 and MLL2 encode MLL enzymes that add methyl groups (Me) to the histone proteins 
around which genes are packaged as chromatin, and MOZ adds acetyl groups (Ac) to these histones. Both 
modifications increase local gene expression, leading to an increase in the proliferation of cancer cells 


through as-yet-unknown mechanisms. 


these genes in part through binding to ETS2 — 
a transcription factor that is known’ to target 
gain-of-function p53 to different genes from 
those activated by normal p53. 

MLLI and MLL2 are members of the SET 
family of histone methyltransferase enzymes’. 
These act as parts of large complexes’ to mod- 
ulate gene expression by attaching methyl 
groups to a lysine amino-acid residue (K4) 
of the histone H3 protein. Such H3K4 meth- 
ylation allows increased transcription of the 
gene packaged around the histones. The 
authors found that gain-of-function p53 
also activates expression of the gene MOZ, 
which encodes an enzyme that adds an acetyl 
group to K9 of H3, again allowing increased 
gene expression. 

In agreement with the idea that gain-of- 
function p53 affects histone modification, 
reducing levels of the mutant p53 decreased 
H3K9 acetylation. However, p53 reduction 
had only a small effect on H3K4 methylation. 
Perhaps, as Zhu and colleagues suggest, this 
is because other members of the SET family 
have similar but p53-independent roles to 
MLLI and MLL2. The authors demonstrated 
that gain-of-function p53 activates MLL1, 
MLL2 and MOZ, and showed that this acti- 
vation is partly responsible for the ability of 
the mutant p53 to enhance cell proliferation 
in vitro (Fig. 1). 

Finally, mining human-cancer databases 
provided support for Zhu and colleagues’ 
data, indicating that expression of MLL1, 
MLL2and MOZ is significantly upregulated in 
human tumours with select p53 gain-of-func- 
tion mutants compared with tumours lacking 
p53 or those without mutant p53. This corre- 
lation is not obvious across breast cancers”®, 
the tissue of origin for many cell lines studied 
in this work — although Zhu et al. clearly 
show that such a correlation exists in the cell 
lines that they used. It is likely that other vari- 
ables affect MLL expression in some tumour 
types. As a result, it will be important to inves- 
tigate the contextual factors that determine 


whether or when gain-of-function p53 can 
trigger changes in MLL and MOZ expression, 
and to analyse the mechanisms underlying 
these events. 

Gain-of-function p53 was also recently 
shown"' to act with the SWI/SNF chromatin- 
remodelling complex to upregulate many 
genes that can themselves mediate the cancer- 
causing activities of gain-of-function p53. 
This finding, taken together with Zhu and 
colleagues’ demonstration that this p53 is 
linked to chromatin and, by extension, to the 
transcriptome (the complete gene-expression 
profile of the cell), could explain why so many 
genes are affected by the presence of gain-of- 
function p53. But precisely how p53 proteins 
with diverse mutations acquire similar capa- 
bilities remains to be discovered. 

One possibility is that p53 mutants adopt 
a different structure from normal p53 that 
enables their interaction with ETS2. However, 
there is no obvious explanation for the evo- 
lution of such an interaction, and this model 
is at odds with the observation” that some 
gain-of-function p53 proteins have similar 
structures to normal p53. Alternatively, the 
ability of normal p53 to bind to thousands of 
sites in the human genome might prevent it 
from associating with the factors with which 
gain-of-function p53 interacts. Or perhaps the 
expression of one or more target genes some- 
how actively prevents the normal protein from 
engaging in the interactions that are character- 
istic of the mutant p53. 

The authors’ link between gain-of-func- 
tion p53 and MLLI and MLL2 is intrigu- 
ing, given that members of the MLL family 
are frequently mutated in human cancers”. 
For example, chromosomal translocations 
involving MLL1 can drive leukaemia, and 
MLL2 mutations are common in several car- 
cinomas. But MLL] translocations eliminate 
the gene’s histone methyltransferase domain, 
and MLL2 mutations seem to be inactivating. 
The explanation for this apparent discrepancy 
with Zhu and colleagues’ findings is unclear, 
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but probably reflects context-dependent 
differences in enzyme function. 

Could targeting MLL1, MLL2 or MOZbea 
strategy for treating tumours involving gain- 
of-function p53? Zhu et al. showed that two 
inhibitors of MLL-complex formation block 
proliferation in cells expressing mutant p53, 
but do not affect those lacking p53. Eliminat- 
ing gain-of-function p53 or interfering with 
its mechanism of action can have anticancer 
effects in vitro and in mice*!*’. Moreover, 
there is much enthusiasm for cancer treat- 
ments that affect chromatin modification, 
and compounds that target some chromatin- 
modifying activities have been approved for 
use in the clinic or are currently in clinical 
trials. But more work is required, because 
the specificity of the MLL inhibitors is not 
entirely established. Furthermore, MLL genes 
are active during embryonic development, and 
their inhibition can cause embryonic death, 
independent of TP53 mutations’. These 
observations, together with the previously 
mentioned fact that mutations disrupting 
MLL function are common in tumours, raise 
concerns that MLL inhibitors might be toxic, 
or might even promote tumours. 

Nonetheless, given the frequency with which 
TP53 is mutated in cancer, continued efforts to 
modulate the effects of mutant p53 are clearly 
warranted. With the key targets of MLL and 
MOZin hand, specific therapies might become 
possible. Thus, Zhu and colleagues’ study, and 
those of others, might point to treatments for 
tumours that harbour TP53 mutations. = 
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Mapping tree density at a global scale 


T. W. Crowther!, H. B. Glick!, K. R. Covey’, C. Bettigole’, D.S. Maynard, S. M. Thomas’, J. R. Smith', G. Hintler', M. C. Duguid’, 


G. Amatulli?, M.-N. Tuanmu?, W. Jetz>**, C. Salas’, C. Stam®, D. Piotto’, R. Tavani®, S. Green®!°, G. Bruce’, S. J. Williams™ 


’ 


S. K. Wiser!?, M. O. Huber'’, G. M. Hengeveld", G.-J. Nabuurs”, E. Tikhonova"’, P. Borchardt"®, C.-F. Li'”, L. W. Powrie!®, 
M. Fischer’??°, A. Hemp”, J. Homeier”*, P. Cho”’, A. C. Vibrans**, P. M. Umunay’, S. L. Piao”’, C. W. Rowe’, M. S. Ashton!, 


P. R. Crane! & M. A. Bradford! 


The global extent and distribution of forest trees is central to our understanding of the terrestrial biosphere. We provide 
the first spatially continuous map of forest tree density at a global scale. This map reveals that the global number of trees is 
approximately 3.04 trillion, an order of magnitude higher than the previous estimate. Of these trees, approximately 
1.39 trillion exist in tropical and subtropical forests, with 0.74 trillion in boreal regions and 0.61 trillion in temperate 
regions. Biome-level trends in tree density demonstrate the importance of climate and topography in controlling local 
tree densities at finer scales, as well as the overwhelming effect of humans across most of the world. Based on our 
projected tree densities, we estimate that over 15 billion trees are cut down each year, and the global number of trees has 
fallen by approximately 46% since the start of human civilization. 


Forest ecosystems harbour a large proportion of global biodiversity, 
contribute extensively to biogeochemical cycles, and provide count- 
less ecosystem services, including water quality control, timber 
stocks and carbon sequestration’ *. Our current understanding of 
the global forest extent has been generated using remote sensing 
approaches that provide spatially explicit values relating to forest 
area and canopy cover*”®. Used in a wide variety of global models, 
these maps have enhanced our understanding of the Earth sys- 
tem***, but they do not currently address population numbers, 
densities or timber stocks. These variables are valuable for the mod- 
elling of broad-scale biological and biogeochemical processes’° 
because tree density is a prominent component of ecosystem struc- 
ture, governing elemental processing and retention rates”*””, as well 
as competitive dynamics and habitat suitability for many plant and 
animal species'”’. 

The number of trees in a given area can also be a meaning- 
ful metric to guide forest management practices and inform 
decision-making in public and non-governmental sectors’*’*. For 
example, international afforestation efforts such as the ‘Billion 
Trees Campaign’, and city-wide projects including the numerous 
‘Million Tree’ initiatives around the world have motivated civil 
society and political leaders to promote environmental stewardship 
and sustainable land management by planting large numbers of 
trees'*'*'’, Establishing targets and evaluating the proportional 
contribution of such projects requires a sound baseline understand- 
ing of current and potential tree population numbers at regional 
and global scales'*””. 


The current estimate of global tree number is approximately 
400.25 billion’*. Generated using satellite imagery and scaled based 
on global forest area, this estimate engaged policy makers and envir- 
onmental practitioners worldwide by suggesting that the ratio of 
trees-to-people is 61:1. This has, however, been thrown into doubt 
by a recent broad-scale inventory that used 1,170 ground-truthed 
measurements of tree density to estimate that there are 390 billion 
trees in the Amazon basin alone”. 


Mapping tree density 
Here, we use 429,775 ground-sourced measurements of tree density 
from every continent on Earth except Antarctica to generate a global 
map of forest trees. Forested areas are found in most of Earth’s 
biomes, even those as counterintuitive as desert, tundra, and grassland 
(Fig. la, b). We generated predictive regression models for the 
forested areas in each of the 14biomes as defined by The Nature 
Conservancy (http://www.nature.org). These models link tree density 
to spatially explicit remote sensing and geographic information sys- 
tems (GIS) layers of climate, topography, vegetation characteristics 
and anthropogenic land use (see Extended Data Table 1). Following 
almost all of the collected data sources, we define a tree as a plant with 
woody stems larger than 10 cm diameter at breast height (DBH)”. 
Incorporating plot-level measurements from more than 50 coun- 
tries, the measured tree density values were inherently variable within 
and among biomes (Figs 1 and 2). However, the large number of tree 
density measurements ensured that the confidence in our mean (and 
total) estimates is high (Fig. 3). Furthermore, the scale of these data 
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Figure 1 | Map of data points and raw biome-level forest density data. 
a, Image highlighting the ecoregions (shapefiles provided by The Nature 
Conservancy (http://www.nature.org)) from which the 429,775 ground- 


sourced measurements of tree density were collected. Shading indicates the 


ensures that our modelled estimates are unlikely to be influenced 
significantly by recent forest loss, reforestation or natural forest regen- 
eration, which are responsible for a net global change of <1% of the 
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total number of plot measurements collected in each ecoregion. A global forest 
map was overlaid in green to highlight that collected data span the majority 
of forest ecosystems on a global scale. b, The median and interquartile range of 
tree density values collected in the forested areas of each biome. 
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global forest area each year’. Biome-level validation estimates indicate 
that our models have high precision when predicting the mean tree 
densities of omitted validation plots (Fig. 3a). Although the accuracy 


Figure 2 | Heat plots showing the relationships 
between predicted and measured tree density 
data. a-l, Predictions were generated using 
generalized linear models (n = 429,775). Diagonal 
lines indicate 1:1 lines (perfect correspondence) 
between predicted and observed points, scaled to 
the kilometre level. Colours indicate the proportion 
of data points from that biome that fall within each 
pixel. Biomes with a greater number of plot 
measurements have greater variability but higher 
confidence in the mean estimates, highlighting the 
trade-off between broad-scale precision and fine- 
scale accuracy. Axes are log-transformed to 
account for exceptionally high variability in tree 
density. 
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Figure 3 | Validation plots for biome-level predictions. a, Biome-level 
regression models predict the mean values of the omitted validation plot 
measurements in 12 biomes. Overall, the models underestimated mean tree 
density by ~3% (slope = 0.97) but this difference was not statistically 
significant (P = 0.51). Bars show + one standard deviation for the predicted 
mean and the grey area represents the 95% confidence interval for the mean. 
The values plotted here represent mean densities for the plot measurements 
(that is, for forested ecosystems), rather than those predicted for each entire 
biome. b, The standard deviation of the predicted mean values as a function 
of sample size. As sample size increases, the variability of the predicted mean 
tree density reaches a threshold, beyond which an increase in sample size 
results in a minimal increase in precision. Standard deviations were 
calculated using a bootstrapping approach (see Methods), and smooth curves 
were modelled using standard linear regression with a log—log 
transformation. 


of our models is limited at the level of an individual hectare, the 
precision of the mean density estimates is high (+40 trees ha‘) 
beyond a threshold of ~200 plots (Fig. 3b). 


Global-level and biome-level patterns 

Together, the biome-level models provide the first spatially continuous 
map of global tree densities at a 1-km? (30 arc-seconds) resolution 
(Fig. 4a). Based on this map, we estimate that the global number of trees 
is approximately 3.04 trillion (+0.096 trillion, 95% confidence intervals 
(CI)). An order of magnitude higher than the previous global estimate’’, 
the scale of our projection is consistent with recent large-scale inventories 
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in Europe, North America and the Amazon basin’” (Fig. 4d). With a 
human population of 7.2 billion, our estimate of global tree density 
revises the ratio of trees per person from 61:1 to 422:1. 

At the biome-level, the highest tree densities exist in forested 
regions of the Boreal and Tundra zones (Fig. 1b). In these northern 
latitudes, limited temperature and moisture lead to the establishment 
of stress-tolerant coniferous tree species that can reach the highest 
densities on Earth (Fig. 1). However, the tropical regions contain a 
greater proportion of the world’s forested land. A total of 42.8% of the 
planet’s trees exist in tropical and subtropical regions, with another 
24.2% and 21.8% in boreal and temperate biomes, respectively 
(Fig. 4a). 


Within-biome trends 


Our models also provide mechanistic insights into potential con- 
trols on tree density within biomes (Fig. 5). For example, various 
climatic parameters correlate with mean forest density within all 
ecosystem types. Tree density generally increases with temperature 
(mean annual temperature and temperature seasonality) and mois- 
ture availability (precipitation regimes, evapotranspiration or arid- 
ity). These patterns are consistent with previous broad-scale tree 
inventory studies and support the idea that, within ecosystem 
types, moist, warm conditions are generally optimal for tree 
growth'??, 

Given the generally positive effects of moisture availability and 
warmth on tree density within biomes, the negative relationships 
observed in some regions may seem surprising (Fig. 5). This high- 
lights the complex suite of population- and community-level selec- 
tion pressures that can obscure the expected effects of climate across 
landscapes. For example, in colder (boreal or tundra) biomes, 
increasing moisture levels can cause hydric and permafrost condi- 
tions in lower lying topographies, which then limit nutrient avail- 
ability for tree development”’. In addition, current and historical 
anthropogenic land use decisions have the potential to drive these 
relationships in several regions. The negative relationships between 
tree density and moisture availability in flooded grasslands and trop- 
ical dry forests are, for example, likely to be driven by preferential use 
of moist, productive land for agriculture’'. As a result, forest ecosys- 
tems are often relegated to drier regions, reversing the expected 
within-biome relationships between moisture availability and tree 
density. Such effects will vary among countries, depending on 
human population densities, alternative resource availability and 
socio-economic status*””’. 

Along with these indirect effects of human activity, the direct 
effect of human development (percentage developed and managed 
land)° on tree density represented the only common mechanism 
across all biomes (Fig. 5). The negative relationships between tree 
density and anthropogenic land use exemplify how humans contend 
directly with natural forest ecosystems for space. Whereas the nega- 
tive effect of human activity on tree numbers is highly apparent at 
local scales, the present study provides a new measure of the scale of 
anthropogenic effects, relative to other environmental variables. 
Current rates of global forest cover loss are approximately 
192,000 km? each year?. By combining our tree density information 
with the most recent spatially explicit map of forest cover loss over 
the past 12 years’, we estimate that deforestation, forest manage- 
ment, disturbances and land use change are currently responsible 
for a gross loss of approximately 15.3 billion trees on an annual basis. 
Although these rates of forest loss are currently highest in tropical 
regions’, the scale and consistency of this negative human effect 
across all forested biomes highlights how historical land 
use decisions have shaped natural ecosystems on a global scale. 
Using the projected maps of current and historic forest cover provided 
by the United Nations Environment Programme (http://geodata. 
grid.unep.ch), our map reveals that the global number of trees has fallen 
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Figure 4 | The global map of tree density at the 1-km” pixel (30 arc-seconds) 
scale. a, The scale refers to the number of trees in each pixel. b, c, We highlight 
the map predictions for two areas (South American Andes (b) and Sardinia 
(c)) and include the corresponding images for visual comparison. All maps and 
images were generated using ESRI basemap imagery. d, A scatterplot as 


by approximately 45.8% since the onset of human civilization (post- 
Pleistocene). 


Discussion 


The global map of tree density can facilitate ongoing efforts to under- 
stand biogeochemical Earth system dynamics**”” by incorporating 
ecosystem features that relate to elemental cycling rates’’®. For 
example, tree abundance can help to explain some of the variation 
in carbon storage and productivity within ecosystem types”’, but the 
strength of these effects remain untested across biomes*. We assessed 
the relationship between tree density and plant carbon storage at a 
global scale by regressing our plot-level tree counts against modelled 
estimates of plant biomass carbon in those sites**. This revealed a 
positive effect of tree density on plant carbon storage (P < 0.001). 
However, the strength of the relationship is weak (r7 = 0.14), reflect- 
ing the vast array of local ecological forces that can obscure such 
global trends. For example, the effect of tree density is likely to interact 
strongly with tree size. Larger trees contain the greatest proportion of 
carbon in woodlands”, but the highest tree densities within a given 
ecosystem type are often associated with young or recovering forests 
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validation for our broad-scale estimates of total tree number. This shows the 
relationship between our predicted tree estimates and reported totals for 
regions with previous broad-scale tree inventories (see Methods for details). 
The straight line and the dotted line are the predicted best fit line and the 1:1 
line, respectively. 


characterized by many small trees'*”°. A thorough understanding of 
total vegetative carbon storage requires information about both the 
size and the number of individual trees. 

A dense forest environment is a fundamentally different ecosystem 
from a sparse one and this influences a vast array of biotic and abiotic 
processes’”"”. Current remote sensing tools capture some, but not all 
of this information. The tree density layer that we provide can there- 
fore augment the currently available layers by providing unique 
insights into ecological dynamics that are not represented by esti- 
mates of forest cover or biomass**®. It can inform biodiversity esti- 
mates and species distribution models by capturing perceivable 
environmental characteristics that determine habitat suitability for 
a wide variety of plants and animals'’’*. Baseline estimates of tree 
populations are also critical for projecting population- and commun- 
ity-level tree demographics under current and future climate change 
scenarios*®, and for guiding local, national, and international refor- 
estation/afforestation efforts'*””. Finally, by allowing us to compre- 
hend the global forest extent in terms of tree numbers, this map 
contributes to our fundamental understanding of the Earth’s terrest- 
rial system. 
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Figure 5 | Standardized coefficients for the variables included in final 
biome-level regression models. Coefficients represent relative per cent 
change in tree density for one standard deviation increase in the variable. Red 
and blue circles indicate negative and positive effects on tree density, 
respectively. Circle size indicates the magnitude of effects. All layers are 
available at the global scale. Human development = per cent developed and 
managed land; LAI = leaf area index; EVI = enhanced vegetation index; EVI: 
ASM = angular second moment of EVI; EVI: contrast = contrast of EVI; and 
EVI: dissimilarity = dissimilarity of EVI (see Extended Data Table 1). 
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METHODS 


Data collection and standardisation. Plot-level data were collected from inter- 
national forestry databases, including the Global Index of Vegetation-Plot 
Database (GIVD http://www.givd.info), the Smithsonian Tropical Research 
Institute (http://www.stri.siedu), ICP-Level-I plot data which covers most of 
Europe (http://www.icp-forests.org), and National Forest Inventory (NFI) ana- 
lyses from 21 countries, including the USA (http://fia.fs.fed.us/) and Canada 
(https://nfi.nfis.org/index.php). This information was supplemented with data 
from peer-reviewed studies reporting large international inventories published 
in the last 10 years (collected using ISI Web of Knowledge, Google Scholar and 
secondary references)'*?””*. 

We only included density estimates where individual trees met the criterion of 

210cm diameter at breast height (DBH). Although NFI databases can vary 
slightly in their definition of a mature tree (for example, the US Forest Service 
Forest Inventory and Analysis (FIA)” defines a tree as a plant with woody stems 
larger than 12.7 DBH) the vast majority of sources use 10 cm as the DBH cut-off. 
Indeed, this was the only size class provided by all broad-scale inventories 
(including the FIA), so density estimates at other DBH values were excluded. 
This provided a total of 429,775 measurements of forest tree density (each 
generated at the hectare scale) that were then linked to spatially explicit 
remote-sensing data and GIS variables to explore the patterns in forest tree 
density at a global scale. The scale of our plot data (in terms of number and 
distribution of plots) ensured that any plot location uncertainty or minor 
changes in global forest area are unlikely to alter mean values or modelled 
estimates. 
Acquisition and preprocessing of spatial data. For predictive model develop- 
ment, we selected 20 geospatial covariates from a larger pool of potential covari- 
ates based on uniqueness, spatial resolution and ecological relevance (Extended 
Data Table 1). Covariates were derived through satellite-based remote sensing 
and ground-based weather stations, and can be loosely grouped into one of four 
categories: topographic, climatic, vegetative or anthropogenic. Topographic cov- 
ariates included elevation, slope, aspect (as northness and eastness), latitude (as 
absolute value of latitude) and a terrain roughness index (TRI). Climatic covari- 
ates included annual mean temperature, temperature annual range, annual pre- 
cipitation, precipitation of driest month, precipitation seasonality (coefficient of 
variation), precipitation of driest quarter, potential evapotranspiration per hec- 
tare per year, and indexed annual aridity. Vegetative covariates included, 
enhanced vegetation index (EVI), leaf area index (LAI), dissimilarity, contrast, 
and angular second moment. We also included a single anthropogenic covariate: 
proportion of urban and/or developed land cover (see Extended Data Table 1). 

Several covariates bear special mention. Moving-window analyses were applied 
to an EVI derived from a multi-year composite of moderate resolution imaging 
spectroradiometer (MODIS) imagery. From the result, we extracted three sec- 
ond-order textural covariates that reflect the heterogeneity of vegetation, inten- 
ded to capture difference in vegetative structure. These include angular second 
moment (the orderliness of EVI among adjacent pixels), contrast (the exponen- 
tially weighted difference in EVI between adjacent pixels: see http://earthenv.org 
for details), and dissimilarity (difference in EVI between adjacent pixels). Terrain 
roughness index (the mean of absolute differences between a cell and its adjacent 
neighbours) was derived from aggregated Global Multi-Resolution Terrain 
Elevation Data of 2010. Terrain roughness index was computed using the eight 
neighbouring pixels, while the others were computed using the four neighbouring 
pixels located at 0°, 45°, 90°, 135° (see http://earthenv.org and ref. 36 for details). 

We preprocessed all spatial covariates using ArcMap 10.1 (ESRI, Redlands, 
CA, 2012) and RStudio 0.97.551 (RStudio, 2012). All covariates were reprojected 
to the interrupted Goode Homolosine equal-area coordinate system (which max- 
imises spatial precision by amalgamating numerous region-specific equal-area 
projections) to optimize the areal accuracy of our final figures*’. These were then 
resampled to match the coarsest resolution used during analysis (nominal 1 km? 
pixels), and spatially coregistered using nearest neighbour resampling where 
necessary. 

To account for broad-scale differences in vegetation types, we developed spatial 
models at the biome scale. Individual predictive models were generated within 
each of 14 broad ecosystem types (delineated by the Nature Conservancy http:// 
www.nature.org) to improve the accuracy of estimates. 

Statistical modelling. We used generalized linear models to generate predictive 
maps of tree numbers within forested ecosystems for each biome. This approach 
also enabled us to explore the mechanisms potentially governing patterns in 
forest tree density within regions (Fig. 5). Due to the inherently interactive nature 
of climate, soil and human impact factors across the globe, we predicted that there 
would be pronounced non-independence within the full suite of biophysical 
variables extracted from the compiled GIS layers. To account for this colinearity, 
we performed ascendant hierarchical clustering using the hclustvar function in 


R’s ClustOfVar package*’ in each biome-level model. This analysis splits the 
variables into different clusters (similar to principal components) in which all 
variables correlate with one another. A single best ‘indicator’ variable is then 
selected from each cluster, based on squared loading values representing the 
correlation with the central synthetic variable of each cluster (that is, the first 
principal component of a PCAmix analysis). This set of ‘best’ indicator variables 
for each biome was then included in all subsequent models used to estimate 
controls on forest tree density. 

Using the resulting set of variables, we constructed generalized linear models 
with a negative binomial error structure (to account for count data that could not 
extend below zero) for each biome (Extended Data Figs 1, 2 and 3) and performed 
a multi-model dredging using the dredge function in R’s MuMIn package”. This 
function constructs all possible candidate sub-models nested within the global 
model, identifies the most plausible subset of models for each data set, and then 
ranks them according to corrected Akaike Information Criterion (AICc) values 
and AIC likelihood weights (AICcw). We derived covariates, coefficients, and 
variance-covariance matrices for biome-level models through weighted model 
averaging the dredged model results with cumulative AIC weights at least equal to 
0.95 (ref. 33). Given the inherent sampling bias present in our plot data (tree 
density estimates were only collected in forested ecosystems and non-forested 
regions are under-represented), our modelling approach was used to generate 
predictive estimates of forest tree density, and these estimates were subsequently 
scaled based on the total area of forested land in each pixel (see spatial modelling 
for details). 

Model validation and testing. We assessed the model fit by investigating the 
bias and precision present when predicting mean tree density across an 
aggregate number of plots. This approach allowed us to test how many 
plots are required to ensure that the predicted mean (or total) forest density 
has reasonable bias and precision. 20% of the plots within each biome were 
randomly omitted before model fitting to serve as an independent data set for 
model testing. Initial model validation was conducted using the biome-spe- 
cific regression models (obtained from the remaining 80% of the data) to 
predict the tree density for each omitted plot. The mean predicted tree den- 
sity of the omitted data was then regressed against the mean observed tree 
density of the omitted data for each biome (Fig. 2). In addition, a bootstrap- 
ping algorithm was used to quantify the standard deviation of the mean 
prediction as a function of sample size following ref. 34. For each biome, 
we generated empirical bootstrap estimates of the standard deviation of the 
predicted mean using random samples drawn from the withheld validation 
plots. Specifically, for each biome a bootstrap sample of size n was selected, 
with replacement, from the omitted data in that biome. The fitted regression 
model for that biome (based on the 80% retained data) was used to predict the 
tree density of each point, and the mean of the samples was calculated. This 
process was repeated 10,000 times for each sample size (n = 10, 20, ..., 500) 
and in each case the empirical standard deviation of the 10,000 sample mean 
was calculated and plotted (Fig. 2). Where the number of plot records in a 
biome fell below the sample size threshold identified through bootstrapping, 
we used models from the most similar biome available (in terms of phylo- 
genetic relatedness of the dominant tree species and mean tree density from 
the few plot values collected). This was the case for the two smallest biomes: 
‘mangroves’ (0.23% of land surface) and ‘tropical coniferous’ (0.46% of land 
surface) forests, which used models from ‘tropical moist’ and ‘temperate 
coniferous’, respectively. 

Spatial modelling. Following model averaging and bootstrapping, we applied the 
final negative binomial regression equations used in bootstrapping to pixel-level 
spatial data at the biome level. Regressions were run in a map algebra framework 
wherein equation intercepts and coefficients were applied independently to each 
pixel of our coregistered global covariates to produce a single map of forest tree 
density on a per-hectare scale. We then scaled our per-hectare forest density 
estimates to the 1-km/ scale based on the total area of forested land within each 
pixel, as estimated by the global 1-km consensus land cover data set for 2014 
(ref. 6). This process was then validated using an older (2013) data set that used 
fine-scale (30 m) forest cover information’, which revealed equivalent total tree 
counts. By multiplying our predicted forest density by the area of forest, we 
ensured that we did not overestimate tree densities in non-forested sites. From 
the resulting maps, summary statistics (mean tree density, total tree number) 
were derived for each polygonal area of interest. The variances of the global and 
biome-specific totals were calculated using a Taylor series approximation to 
account for the log-link negative binomial regression function and correlation 
among the regression-based predicted values”. 

By generating models at the biome-level, we were able to account for broad- 
scale differences in vegetation types between biomes, while maintaining high 
precision of our mean (and total) estimates at the global scale (due to the high 
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number of plot measurements within biomes). However, biome-level models are 
limited in their accuracy when predicting tree density at fine-scales, which might 
ultimately have the potential to alter final numbers. We therefore constructed 
models within each of 813 global ecoregions (delineated by the Nature 
Conservancy http://www.nature.org) as a validation for the first biome-level 
approach. We generated models and estimated tree numbers using exactly the 
same approach as for the biome-level models. Total, and biome-level, tree esti- 
mates did not differ significantly (P<0.05) from those generated using the 
biome-level models (Extended Data Fig. 4). 
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Extended Data Figure 1 | Histogram of the collected measurements of forest tree density in each biome around the world (n = 429,775). The red line and the 
blue dotted lines indicate the mean and median for the collected data, respectively. Data in each biome fitted a negative binomial error structure. 
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Extended Data Figure 2 | Histogram of the predicted forest tree density 
values for the locations that density measurements were collected in each 
biome around the world (n = 429,775). The red line and the blue dotted lines 
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indicate the mean and median for the collected data, respectively. As our 
models were based on mean values, the majority of points fall on or close to the 
mean values in each biome. 
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Extended Data Figure 3 | Histogram of the total predicted forest tree highlights that our map accounts for the sampling bias in tree density plots 
density values for each pixel within each biome around the world (for example, although we had no zero values in our desert plots, the vast 


(n = 429,775). This illustrates the spread of pixels throughout each biome,and _ majority of desert pixels contain no trees). 
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Total trees: 3,041,173,150,000 


High: 3,654,825 


- Low: 0 


Total trees: 3,252,929,340,000 


Extended Data Figure 4 | Comparison between approaches to generate 
the global tree density map. The initial map was generated using 14 biome- 
level models (biomes delineated by The Nature Conservancy http:// 
www.nature.org) to account for broad-scale variations in terrestrial vegetation 
types. With several thousand plot-level density measurements in most 
biomes, this approach provided highly accurate estimates at the global scale. 
However, to improve precision at the local scale, we also generated a map using 


ecoregion-scale models. Separate models were generated within each of 813 
global ecoregions (also delineated by The Nature Conservancy to reflect 
smaller-scale vegetation types) using exactly the same statistical approach (see 
Methods). The same 429,775 data points were used to construct each map. 
Biome-level and ecoregion-level maps provide total tree estimates of 3.041 and 
3.253 trillion trees, respectively. 
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Extended Data Table 1 | Estimates of the total tree number for each of the biomes that contain forested land, as delineated by The Nature 
Conservancy (http://www.nature.org) 


% Total Total Trees +2SD % Total 
Biome Land Area n Billions Billions Trees 
Boreal Forests 11.49% 8688 749.34 50.07 24.28% 
Deserts 21.01% 14637 $2.95 2.92 1.75% 
Flooded Grasslands 0.79% 271 64.58 14.19 2.13% 
Mangroves 0.23% 21 8.18 0.26 0.27% 
Mediterranean Forests 2.43% 16727 53.42 1.20 1.76% 
Montane Grasslands 3.88% 138 60.3 24.04 1.99% 
Temperate Broadleaf 9.32% 278395 362.6 2.90 11.98% 
Temperate Conifer 3.18% 85144 150.57 1.34 4.97% 
Temperate Grasslands 7.18% 17051 148.29 4.93 4.90% 
Tropical Coniferous 0.48% 0 22.21 0.40 0.73% 
Tropical Dry 2.85% 115 156.37 63.42 5.17% 
Tropical Grasslands 14.66% 999 318.01 35:52 10.51% 
Tropical Moist 14.81% 5321 799.45 23.98 26.41% 
Tundra 5.25% 2268 94.89 6.31 3.14% 
Total 429775 3041.17 96.07 
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Gain-of-function p53 mutants co-opt 
chromatin pathways to drive cancer growth 


Jiajun Zhu?) Morgan A. Sammons!?, Greg Donahue!?, Zhixun Dou!?, Masoud Vedadi**, Matthiius Getlik®, 
Dalia Barsyte-Lovejoy*, Rima Al-awar*®, Bryson W. Katona’, Ali Shilatifard®, Jing Huang’, Xianxin Hua’, 


Cheryl H. Arrowsmith*!° & Shelley L. Berger’? 


TP53 (which encodes p53 protein) is the most frequently mutated gene among all human cancers. Prevalent p53 missense 
mutations abrogate its tumour suppressive function and lead to a ‘gain-of-function’ (GOF) that promotes cancer. Here 
we show that p53 GOF mutants bind to and upregulate chromatin regulatory genes, including the methyltransferases 
MLLI (also known as KMT2A), MLL2 (also known as KMT2D), and acetyltransferase MOZ (also known as KAT6A or 
MYST3), resulting in genome-wide increases of histone methylation and acetylation. Analysis of The Cancer Genome 
Atlas shows specific upregulation of MLL1, MLL2, and MOZ in p53 GOF patient-derived tumours, but not in wild-type 
p53 or p53 null tumours. Cancer cell proliferation is markedly lowered by genetic knockdown of MLLI or by 
pharmacological inhibition of the MLLI methyltransferase complex. Our study reveals a novel chromatin mechanism 
underlying the progression of tumours with GOF p53, and suggests new possibilities for designing combinatorial 
chromatin-based therapies for treating individual cancers driven by prevalent GOF p53 mutations. 


Most mutant forms of p53 are caused by single amino acid substitu- 
tions mapping to the DNA-binding domain’. These mutations result 
in expression of full-length p53 protein, but loss of wild-type (WT) 
tumour suppressive function’ *. The high prevalence of missense 
substitutions, particularly certain ‘hotspot’ mutations, suggests a 
selective advantage during cancer progression. Indeed, these 
mutants gain neomorphic oncogenic functions, including altered 
cancer spectrum’’, deregulated metabolic pathways*°, increased 
metastasis®’ and enhanced chemotherapy resistance®. Evidence from 
recent studies points to one potential mechanism of GOF p53, func- 
tioning through association with other transcription factors and 
driving gene transcription in oncogenic pathways, such as the meva- 
lonate pathway* and etoposide-resistance pathway*. A transcrip- 
tional mechanism is further supported by the importance of 
retaining an intact transactivation domain for oncogenic GOF p53 
function*’. Nevertheless, how GOF p53 contributes to major changes 
of the cancer genome and transcriptome remains to be elucidated”. 
Altered chromatin pathways have been implicated in various aspects 
of cancer’'’’, given their regulation of genome-wide transcription 
programs’*'*, However, to our knowledge, to date there has not been 
evidence of direct crosstalk between GOF p53 mutants and chro- 
matin regulation. 


Genome-wide binding of GOF p53 mutants 


We carried out chromatin immunoprecipitation followed by sequen- 
cing (ChIP-seq) to determine genome-wide binding locations of p53 in 
a panel of breast cancer cell lines: MCF7 (wild-type p53), MDA-MB- 
175VII (wild-type p53), HCC70 (p53(R248Q)), BT-549 (p53(R249S)), 
and MDA-MB-468 (p53(R273H)). We found that the binding of p53 
to gene-proximal regions (less than 10 kilobases (kb)) of transcription 


start sites (TSS) in the two wild-type p53 cell lines strongly resembled 
each other, whereas these wild-type p53 peaks were highly dissimilar 
from the peaks in any of the GOF p53 mutants. Notably, p53 binding 
patterns in the three GOF p53 cell lines were similar when compared 
to each other (Fig. la and Extended Data Fig. la). In addition, 
we aligned published p53(R248W) ChIP-seq data from Li-Fraumeni 
syndrome (LFS) MDAH087 cells’, and again, TSS-proximal peaks 
of p53(R248W) resembled those of p53(R273H) and p53(R248Q) 
(Extended Data Fig. 1b, c), but were distinct from the wild-type p53 
peaks (Extended Data Fig. 1d, e). 

We performed motif analysis for TSS-proximal peaks of the 
p53(R273H) mutant that predicted the E26 transformation-specific 
(ETS) motif as the most enriched (Extended Data Fig. 2a), which is 
distinct from the wild-type p53 motif (Extended Data Fig. 2b). One 
ETS family member, ETS2, has been shown to consistently associate 
with mutant p53 (ref. 8). We confirmed that ETS2 interacts with 
various GOF p53 mutants, but interacts to a much lesser extent with 
wild-type p53 (Fig. 1b and Extended Data Fig. 2c), as previously 
noted®. Co-immunoprecipitation at endogenous protein levels also 
demonstrated that ETS2 interacts with GOF p53, but not with wild- 
type p53 (Extended Data Fig. 2d, e). We analysed ChIP-seq data sets 
from the ENCODE project for all transcription factors'*’’, and 
observed that, compared to other transcription factors, ETS family 
proteins have significantly higher overlap with GOF p53 TSS-proximal 
peaks, but not with wild-type p53 TSS-proximal peaks (Extended Data 
Fig. 2f, g). Notably, in both wild-type and GOF p53 cases, RNA poly- 
merase II (Pol II) group has the highest percentage overlap with p53 
peaks, indicative of transcriptional activity. The extent of Pol II overlap 
is similar to the ETS group in GOF p53 cells, but much higher than the 
ETS group in wild-type p53 cells (Extended Data Fig. 2f, g). 
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Figure 1 | Genome-wide binding of GOF p53 mutants. a, Area under the 
curve analysis showing p53 enrichment (ChIP/input) in five cell lines over TSS- 
proximal peak regions identified in each cell line. Mann-Whitney U-tests 

were performed to compute significance for combined wild-type (WT) and GOF 
p53 peaks: MCE7 (P = 2.78 X 10 °), MDA-MB-175VII (P = 2.15 X 10 *), 
MDA-MB-468 (P < 2.2 X 107 1°), HCC70 (P = 1.09 X 10°), BT-549 

(P =3.7 X 10°). b, Co-immunoprecipitation of HEK293T cell-expressed Flag- 
ETS2 with in vitro-expressed GFP- or HA-tagged p53, followed by western blot. 
c. GO analysis of p53(R273H) TSS-proximal peaks (statistics are shown in 
Supplementary Table 1). Uncropped blots are in Supplementary Fig. 1. 


GOF p53 targets chromatin regulators 

To determine specific functional categories, we performed gene onto- 
logy (GO) analysis on TSS-proximal peaks. As expected, DNA 
damage response pathways were most enriched in wild-type p53 tar- 
gets (Extended Data Fig. 2h and Supplementary Table 1). In contrast, 
p53(R273H) bound to genes related to translation and ribosomal 
synthesis (Fig. 1c and Supplementary Table 1), which was reasonable 
given the rapid growth rate of these cells. We were particularly 
intrigued by GOF p53 binding to a group of genes functionally related 
to histone methylation (Fig. 1c). This was seen in UCSC Genome 
Browser views at MLL1 (KMT2A) and MLL2 (KMT2D), genes encod- 
ing methyltransferases of histone H3 lysine 4 (H3K4) (Fig. 2a) that are 
components of alternative forms of the COMPASS complex (complex 
proteins associated with Set1). The other two GOF p53 mutants that 
we examined, as well as p53(R248W) from LFS MDAH087 cells®, 
all showed similar binding at MLLI and MLL2 (Extended Data 
Fig. 3a, b, e, f). UCSC Genome Browser views confirmed binding of 
GOF p53 to a gene encoding a common subunit of COMPASS com- 
plexes, RBBP5 (Extended Data Fig. 3h). In contrast, wild-type p53 did 
not appear to bind any of these genes, although as expected it bound 
promoter regions of its canonical targets, including CDKN1A (which 
encodes p21 protein), MDM2 and BBC3 (also known as PUMA) 
(Fig. 2a and Extended Data Fig. 3c, i, j). We then analysed a large 
set of 600 chromatin regulators for potential GOF p53 binding, and 
found an additional group of chromatin regulatory genes that showed 
peak enrichment (Supplementary Table 2). Of particular interest 
among these was MOZ (KAT6A), a histone acetyltransferase, and 
UCSC Genome Browser views confirmed the presence of GOF p53 
but not wild-type p53 (Fig. 2a and Extended Data Fig. 3d, g). 

Using ChIP-quantitative PCR (ChIP-qPCR), we validated the 
binding of GOF p53 to MLL1, MLL2, and MOZ genes, but not adja- 
cently upstream or downstream of the peak regions (Fig. 2b and 
Extended Data Fig. 4a—c). Moreover, we confirmed GOF p53 binding 
to all other targets in the ‘histone methylation’ GO category (RBBP5, 
OGT and PPPICC), and to a few additional chromatin factors 
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Figure 2 | GOF p53 mutants directly target chromatin regulators. a, UCSC 
Genome Browser views of p53 occupancy over promoter regions of MLL1, 
MLL2, MOZ and CDKN1A. b, c, ChIP-qPCR showing p53 or IgG enrichment 
(ChIP/input) in MDA-MB-468 (b) and MDA-MB-175VII (c) cells. BS, p53 
binding site. Up, upstream of GOF p53 binding region; peak, GOF p53 binding 
region; down, downstream of GOF p53 binding region. A schematic of 
amplicon locations is shown in Extended Data Fig. 4a. d, e, ChIP-qPCR 
showing p53 enrichment changes upon reduction of p53 (d) or ETS2 (e) by 
shRNA-mediated knockdown. Numbers 20 and 21 denote two short hairpins, 
the sequences of which are shown in Supplementary Table 3. f, ChIP-qPCR 
showing p53 or IgG enrichment in MEFs bearing wild-type p53 or 
p53(R172H). Error bars represent mean = s.e.m.; n = 3; two-tailed Student’s 
t-test: *P < 0.05; **P < 0.01; ***P< 0.001. 


(including SMARCD2 and DCAF10), in all three GOF p53 cell lines 
(Extended Data Fig. 4d-f). We verified the ChIP-qPCR results with a 
second p53 polyclonal antibody, FL393 (Extended Data Fig. 4g). In 
parallel experiments with both p53 antibodies, wild-type p53 showed 
binding to the CDKN1A and MDM2 canonical binding sites, but not 
to any of the GOF p53 targets tested (Fig. 2c and Extended Data 
Fig. 4h). We also examined a pancreatic cancer cell line, PANC-1 
(p53(R273H)), and observed a similar binding pattern (Extended 
Data Fig. 4i), suggesting a general phenomenon in various cancer 
types. Furthermore, the ChIP-qPCR signal of GOF p53 was attenu- 
ated upon p53 knockdown (Fig. 2d). Knockdown of ETS2 also led to 
reduced binding of GOF p53 over MLL1 and MOZ, and to a lesser 
extent, over the MLL2 peak region (Fig. 2e). To test the association of 
GOF p53 near MIl1 in a non-tumour background, we performed 
ChIP-qPCR in primary mouse embryonic fibroblasts (MEFs) with 
GOF p53 or wild-type p53, and consistently, mouse GOF p53 showed 
significant enrichment over the Mill promoter region (Fig. 2f). 


GOF p53 regulates MLL, MOZ, and histone modifications 


To examine whether GOF p53 is required for expression of the chro- 
matin regulators, we reduced GOF p53 levels in human cancer cells 
and found that the mRNA levels of MLL1, MLL2 and MOZ were also 
decreased (Fig. 3a and Extended Data Fig. 5a); whereas no change 
was detected when the level of wild-type p53 was reduced (Extended 
Data Fig. 5b). Simply increasing wild-type p53 protein levels by sta- 
bilization mediated by the compound nutlin did not recapitulate 
activation of the chromatin regulators (Extended Data Fig. 5c, d). 
MLLI protein levels were also decreased in the GOF p53 knockdown 
(Fig. 3b), but not by wild-type p53 knockdown (Extended Data Fig. 
5e), as was also observed for MOZ protein levels (Extended Data 
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Fig. 5f). Reduction of ETS2 levels led to decreased expression of MLL1 
and MOZ, and to a lesser extent, MLL2 (Fig. 3c and Extended Data 
Fig. 5g), which was in accordance with the relative binding changes of 
GOF p53 to these genes (Fig. 2e). We verified the ETS2 knockdown 
result in another GOF p53 cell line, BT-549, and detected similarly 
decreased expression of MLL1 and MOZ, and to a lesser extent, MLL2 
(Extended Data Fig. 5h, i). We performed Pol II ChIP-qPCR and 
observed concomitantly decreased Pol II occupancy specifically over 
MLL1, MLL2, and MOZ TSS regions upon ETS2 knockdown (Fig. 3d). 
We examined the importance of another ETS family member, ETS1. 
By contrast, ETS1 knockdown had no effect on the expression of 
MLL1, MLL2 or MOZ (Extended Data Fig. 5j, k), nor did it alter 
GOF p53 or Pol II binding (Extended Data Fig. 51, m). As ETS family 
proteins consist of 28 members”, it is likely that additional ETS pro- 
tein(s) other than ETS1 may be involved. Nevertheless, our observa- 
tions are consistent with previous studies showing that ETS2, but not 
ETS1, is important in mediating GOF p53 function®”’. 

The regulation of histone-modifying enzymes led to investigation 
of the cognate histone post-translational modifications (PTMs). We 
observed a global decrease in histone H3 lysine 9 acetylation (H3K9ac, 
catalysed by MOZ") in response to knockdown of GOF p53, whereas 
other histone acetylation marks did not show notable changes (Fig. 3e 
and Extended Data Fig. 5n, 0, p). The reduction of H3K9ac was 
also observed when the level of MOZ itself was decreased by short 
hairpin RNA (shRNA) (Extended Data Fig. 5q). In contrast, H3K4 
tri-methylation and H3K4 mono-methylation (H3K4me3 and 
H3K4mel, catalysed by MLL1 and MLI2, respectively”) showed only 
a slight global reduction upon GOF p53 knockdown (Fig. 3e and 
Extended Data Fig. 5n, 0, p). This is reasonable, however, given that 
H3K4 is methylated by six members of the COMPASS complexes”’, 
and previous studies showed that inhibiting or knocking one of them 
out did not substantially change global H3K4 methylation’. 

We further validated the regulation of MIl1, MIl2, and Moz by GOF 
p53 in the knock-in MEFs. We found significantly higher expression 
of these genes in GOF p53 MEFs than in wild-type p53 MEFs or in 
MEFs derived from p53 (gene Trp53) knockout mice (p53 null MEFs) 
(Fig. 3f and Extended Data Fig. 6a—c). Furthermore, when GOF p53 
was reduced, MIll expression was also lowered (Fig. 3g and Extended 


Data Fig. 6d), and ectopically expressing GOF p53 in p53 null MEFs 
enhanced MIl1 expression (Extended Data Fig. 6e, f). GOF p53 MEFs 
also showed higher global level of H3K9ac, and a slight elevation of 
H3K4me3, compared with wild-type p53 or p53 null MEFs (Fig. 3h 
and Extended Data Fig. 6g). Notably, other histone modifications 
associated with active gene transcription, including H3K27ac and 
H3K36me3, remained at comparable levels (Fig. 3h). In addition, 
H3K4me3 or H3K9ac did not change upon knockdown of wild-type 
p53 (Extended Data Fig. 6h), even though cell growth was increased as 
expected (Extended Data Fig. 6i). Together, these data suggest that 
changes in H3K4me3 and H3K9ac are specific to GOF p53 directly 
activating MLL1 and MOZ enzymes. 

The modest global change in H3K4me3 in the presence of GOF 
p53 prompted investigation of local changes in H3K4 methylation. 
We performed RNA-seq and H3K4me3 ChIP-seq in MEFs with 
endogenous wild-type p53 or GOF p53. Compared with the gen- 
ome-wide average, known MLL1 target genes” were more highly 
expressed and displayed higher H3K4me3 enrichment in GOF p53 
MEFs (Extended Data Fig. 6j). For example, we observed increased 
H3K4me3 level and RNA expression within the Hoxa gene cluster 
(Fig. 3i and Extended Data Fig. 6k), a well-studied target of MLL1 
and commonly upregulated in leukaemia****. Conversely, wild-type 
p53 targets, such as Cdkn1a, showed decreased RNA expression and 
TSS-associated H3K4me3 in GOF p53 MEFs (Extended Data Fig. 61). 
Notably, H3K4me3 enrichment at the TSS of genes in GOF p53 MEFs 
was slightly, but significantly higher at a genome-wide level than in 
wild-type p53 MEFs (Extended Data Fig. 6m), consistent with the 
slight global increase of H3K4me3 (Fig. 3h). We validated the 
H3K4me3 ChIP-seq and RNA-seq results by ChIP-qPCR and RT- 
qPCR, respectively, observing significantly higher H3K4me3 enrich- 
ment in GOF p53 MEFs, and higher expression of Hox genes, than in 
wild-type p53 or p53 null MEFs (Fig. 3j and Extended Data Fig. 6n). 


MLL] is essential for cancer phenotype of GOF p53 

Previous studies have revealed that cells expressing GOF p53 rely on it 
for cell growth and survival’*”®. GOF p53 knockdown in cancer cells 
led to a strong decrease in cell proliferation (Extended Data Fig. 7a). 
By contrast, lowering of wild-type p53 levels resulted in elevated 


Figure 3 | GOF p53 mutants regulate MLL and 
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growth (Extended Data Figs 6i and 7b). To investigate the function of 
GOF p53 driving chromatin regulators, we carried out the same time 
course, and found that the reduction of MLL1 or MLL2 in GOF p53 
cancer cells led to a striking loss of cell growth (Extended Data Fig. 7c), 
phenocopying the knockdown of GOF p53 itself. By contrast, knock- 
down of MLL1 or MLL2 had a minimal effect on wild-type p53 cancer 
cells (Extended Data Fig. 7d, e). 

We addressed the importance of this pathway to tumour-relevant 
phenotypes, first by examining the ability of cancer cells to form col- 
onies. Reduction of MLLI led to a decreased colony formation ability of 
MDA-MB.-468 cells (p53(R273H)) (Fig. 4a and Extended Data Fig. 7f), 
but had little effect on the colony formation efficiency of MCF7 cells 
(wild-type p53) (Fig. 4b and Extended Data Fig. 7g). Similar results were 
observed in breast cancer cells BT-549 (p53(R249S)) and pancreatic 
cancer cells PANC-1 (p53(R273H)) (Extended Data Fig. 7h, i). We 
further confirmed the tumour formation phenotype in anchorage-inde- 
pendent growth assays in soft agar, showing that decreasing MLL1 
specifically reduced the growth and colony size of GOF p53 cancer cells, 
but not wild-type p53 cancer cells (Extended Data Fig. 7j, k). We also 
investigated tumour growth on NOD-scid-gamma (NSG) immunode- 
ficient mice. Knockdown of MLL1 led to strongly reduced tumour 
formation ability in GOF p53 cells, as compared to GOF p53 cells with 
a non-targeting scrambled control knockdown (Fig. 4c, e). In contrast, 
MLL1 knockdown did not alter the tumour formation ability of wild- 
type p53 cancer cells (Fig. 4d, e), again supporting a specific role for 
MLLI in cancers with GOF p53, but not wild-type p53. 

To further explore a critical role that these chromatin regulators 
may play in supporting growth of GOF p53 cells, and to rule out 
possible confounding factors in established cancer cell lines, we per- 
formed MIl1 knockdown in the primary MEFs with knock-in GOF 
p53. Consistently, MLL1 reduction resulted in decreased proliferation 
of GOF p53 MEFs (Extended Data Fig. 8a). Importantly, re-expression 
of MLL1 in GOF p53 MEFs with p53 knockdown partially rescued the 
growth defects (Fig. 4f); partial rescue probably results from GOF p53 
driving expression of multiple downstream targets, as described above. 
These results strongly indicate a direct role of MLLI, functioning 
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downstream of GOF p53 in maintaining proliferation of GOF p53 
cells. We also performed MLL1 knockdown (Extended Data Fig. 8b) 
in human non-cancer LFS cellsk—MDAH087 (p53(R248W)) and 
MDAH041 (p53 null”’; Extended Data Fig. 8c). Similar to the results 
obtained in cancer cells and in MEFs, MLL1 knockdown reduced the 
growth rate of GOF p53 LFS cells (Fig. 4g and Extended Data Fig. 8d), 
again phenocopying the knockdown of GOF p53 itself (Extended Data 
Fig. 8e), but did not reduce the growth of either p53 null LFS cells 
(Fig. 4h and Extended Data Fig. 8f), nor primary non-cancer cells with 
wild-type p53 (IMR90 lung fibroblasts, Extended Data Fig. 8g, h). Re- 
expression of MLL1 again partially rescued the growth reduction by 
GOF p53 knockdown in LFS cells (Extended Data Fig, 8i). In addition, 
MLL2 knockdown also decreased GOF p53 LFS cell proliferation, but 
not p53 null LFS cells (Extended Data Fig. 8), k). 


COMPASS inhibitors reduce GOF p53 cell growth 


Chromatin regulators have emerged as promising targets of small 
molecule compounds in various human diseases including cancer’”®. 
Menin is a scaffold protein of the COMPASS complex”, directly 
interacting with the amino-terminal of MLL1”™’, and is crucial for 
MLL] activity and for maintenance of a subtype of leukaemia***’. We 
treated both GOF p53 and p53 null LFS cells with the previously 
reported menin antagonist, MI-2-2 (refs 34, 35). In agreement with 
the MLL1 genetic knockdown experiments, MI-2-2 showed a dose- 
dependent inhibition of GOF p53 cell growth (Fig. 5a), but had very 
little effect on p53 null cells (Fig. 5b). 

Recently, inhibition of MLL1 function has also been demonstrated 
by targeting its interaction with the WDRS5 subunit of the COMPASS 
complex**’’. As a second approach to pharmacological inhibition of 
MLL1 activity, we used OICR-9429, a newly characterized antagonist of 
interaction of WDR5 with MLL] (ref. 38). This non-peptide, drug-like 
molecule binds to WDR5 in the MLL1 binding site of WDR5 
(Kg = 93 + 28 nM), and disrupts the assembly of the WDR5/MLL1/ 
RbBP5 complex in cells with half-maximum inhibitory concentration 
(ICs9) values below 1 M**. In striking similarity to MI-2-2, we found a 
dose-dependent inhibition by OICR-9429 of GOF p53 LFS cell growth 


@ = MDA-MB-468 (R273H) bsewcr7(wy 9.2 9.2 MEF (R172H) 
© SP ye 
1.54NS RS oS S29 —~ 210 -O- Ctrl KD + Vec 
Ee phd SF Reo «eo s -@ Ctrl KD + MLL1 
§ a 2 -O- T7p53 KD no. 12359 + Vec 
Ctrl KD Ctrl KD 1 3 a5. S = 1807) © 7753 KD no. 12359 + MLL1 
\ L1.0 QS) 8.38 Sa 2 -O- Trp53 KD no. 54549 + Vec 
3 SSS € 150-6 Tip53 KD no. 54549 + MLL1 
= re} + + + t+ + + 2 Go0 
S = 
. 505 MLL |g bd H 3 
MLL1 KD MLL1 KD 2B a0 
My = 
So 3] OS 
Qe 3 60 
oy ee CUS 
A B-actin 6 30 
© 250 ae 
c ~ — | a 0 2 4 6 8 10 
E 2004 Day 
8 ri a5 | Q LFS-MDAHo87 (R248) IA LFS-MDAHO041 (null) 
a 3 =. | Ss -@ Ctrl KD Ss -@ Ctrl KD 
2 12 soit ° \° ° = 809 O-MLL1 KD no. 14 = 804 1 MLL1 KD no. 14 
< 5 3 3 
= e a | = 60 = 60 
fame 5 504° co * 5 5 
d uh [* 6 € = 
| Male Reng s; a 8 40 8 40 
o oO 
S Pu Cult SRE RE ERE E 20 € 20 
be RY KRY RY s s 
Female 2 | Ry Sw Qu Biv Z z 
Female Male Female Male Oo 0 a ar Oo 0 Ca ee a 
MDA-MB-468 MCF7 Da Day 
(p53(R273H)) (p53 WT) y 


Figure 4 | MLL1 knockdown reduces the proliferation and cancer 
phenotype of GOF p53 cells. a, b, Colony formation (left) and quantification 
(right) in MDA-MB-468 (a) or MCF7 (b) cells with non-targeting control (ctrl) 
or MLL1 knockdown (KD). Two-tailed Student’s t-test; **P < 0.01; NS, 

P> 0.05; n = 3; the other two biological replicates are shown in Extended 
Data Fig. 7f, g. c, d, Excised xenograft tumours 20 weeks after NSG 
immunodeficient mice were subcutaneously injected with MDA-MB-468 (c) or 
MCEF7 (d) cells carrying control or MLL1 knockdown. Two representative 
images out of four total in each group are shown. e, Xenograft tumour volumes 
measured 10 weeks after initial injection described in c and d. Palpable tumours 


at a size below 4 mm’ were recorded as 4 mm’ due to difficulties in 
measurement. Zeros indicate that the mouse did not have a palpable tumour. 
Red horizontal lines shown as average tumour volume of all four mice in each 
group. Mann-Whitney U-test; ***P < 0.001; NS, P > 0.05. f, Growth curve 
analysis and corresponding western blot in p53(R172H) MEFs with control or 
p53 knockdown, and vector control (Vec) or MLLI overexpression. 

g, h, Growth curve analysis in LFS MDAH087 (g) or MDAH041 (h) cells with 
control or MLL1 knockdown. Uncropped blots shown in Supplementary Fig. 1. 
Error bars represent mean + s.e.m.; n = 3. 
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Figure 5 | COMPASS inhibitors specifically 
reduce GOF p53 cell growth. a, b, Growth curve 
analysis of LFS MDAH087 (a) and MDAH041 
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(Fig. 5c), and, again, little effect on p53 null LFS cells (Fig. 5d). 
Moreover, in the genetically controlled MEF cells, we observed similar 
results, that OICR-9429 specifically inhibits cell proliferation of GOF 
p53 MEFs (Fig. 5e, f), but not when GOF p53 is reduced (Fig. 5e) or in 
p53 null MEFs (Fig. 5f). These results provide strong evidence for a 
specific growth inhibitory effect of pharmacological drugs in targeting 
MLL COMPASS complex activity downstream of GOF p53. 

We examined the significance of our findings in the context of 
human tumour samples, by analysing The Cancer Genome Atlas 
(TCGA). Based on p53 mutational status, we grouped tumour 
samples into: (1) wild type (no detectable p53 mutation); (2) GOF 
(missense mutation of R175H, R248Q, R248W, R249S or R273H); 
or (3) p53 null (p53 nonsense mutations or frameshift truncations). 
Tumours with other types of p53 mutations (other missense muta- 
tions, inframe insertion/deletion, or splicing mutations) were not 
included in further analysis, due to an unpredictable effect on the 
downstream chromatin regulators. We also focused our analysis on 
cancer types that include more than 5% of samples in the group com- 
prising GOF p53. We first combined all samples from these cancer 
types, and observed significantly higher RNA expression of MLL1I, 
MLL2 and MOZ in GOF p53 tumours, compared to either wild-type 
p53 or p53 null tumours (Fig. 5g, top panels). As controls, expression 
levels of housekeeping genes including actin (ACTB) and GAPDH are 
consistent across the three groups (Fig. 5g, middle panels), whereas 
expression levels of wild-type p53 targets CDKNIA, MDM2 and 
PUMA are significantly higher in the wild-type p53 group than the 
GOF p53 or p53 null group (Fig. 5g, lower panels). Next, we examined 
individual cancer types and observed similar gene expression patterns as 
the combination of all cancers (Extended Data Fig. 9a-f). Notably, given 
the heterogeneous population of tumour samples, and the small sample 
size of certain groups, not all pairwise comparisons are statistically 
significant, although the same trends always hold that GOF p53 
tumours express higher levels of MLL1, MLL2, and MOZ than the other 
two groups. This is also true with canonical wild-type p53 targets, that is, 
although not all comparisons are statistically significant, the wild-type 
p53 groups always show higher levels of CDKN1A, MDM2 and PUMA 
than the GOF p53 or p53 null tumour groups. 


Discussion 


Our results indicate that distinct prevalent GOF p53 mutants bind toa 
common newly identified group of gene targets genome-wide, to drive 
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expression of genes comprising a chromatin signature. The GOF p53 
mapping occurs immediately associated with ETS motifs, and 
GOF p53 binds directly to ETS2, indicating that the substitutions in 
the DNA-binding domain of p53 unleash a latent interaction with 
ETS family transcription factors, as previously suggested®. Within this 
chromatin signature gene group targeted by GOF p53, the COMPASS 
methyltransferase pathway appears to be particularly well repre- 
sented, but the new binding includes other chromatin regulators, such 
as the acetyltransferase MOZ. We find that expression of these modi- 
fying enzymes is dependent on GOF p53, which in turn elevates 
activating histone modifications, including H3K4me3 and H3K9ac. 
Our evidence points to MLL downstream pathways as key targets of 
GOF p53. Thus, as is the case in leukaemia bearing translocations of 
MLL, MLL pathways may contribute to GOF p53 oncogenic pheno- 
types and therefore cancer progression. 

Importantly, our findings in both human cancer cells and LFS cells 
show that GOF p53 cells lose growth and tumour formation potential 
with similar timing kinetics upon knockdown of MLL1 as they do 
with knockdown of GOF p53. A key comparison—to cancer and LFS 
cells that express wild-type p53 or are null for p53—shows very little 
response to MLL1 knockdown. Hence, GOF p53 cells appear particu- 
larly dependent for growth on the MLL1 pathway. We provide further 
evidence of GOF p53 cell growth dependence on the COMPASS 
complex, by analysing cell sensitivity to two different pharmacological 
small compound inhibitors. These compounds target menin or 
WDERS interaction with MLLI, and inhibit proliferation of LFS cells 
and MEFs expressing GOF p53 but not p53 null. The effects of the 
inhibitors are thus analogous to direct knockdown of MLL1. Hence, 
we conclude that a large cohort of GOF-p53-driven cancers, the 
growth of which was not previously known to be dependent on chro- 
matin pathways, may be amenable to epigenetic therapeutics. 


Online Content Methods, along with any additional Extended Data display items 
and Source Data, are available in the online version of the paper; references unique 
to these sections appear only in the online paper. 
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METHODS 


Data reporting. No statistical methods were used to predetermine sample size. 
The experiments were not randomized. The investigators were not blinded to 
allocation during experiments and outcome assessment. 

Cell culture. MCF7, MDA-MB-175VII, HCC70, BT-549, and MDA-MB-468 cell 
lines were obtained from American Type Culture Collection (ATCC), and were 
cultured in a 37 °C incubator at 20% oxygen, in standard tissue culture medium 
(DMEM with 10% FBS, 100 units per ml penicillin and 100 1g per ml streptomy- 
cin) supplied with non-essential amino acids. Li-Fraumeni Syndrome cell lines 
MDAH087 and MDAH041 were obtained from Michael A. Tainsky (Wayne 
State University, Detroit, MI) as a gift, and were cultured in a 37°C incubator 
at 3% oxygen, in standard tissue culture medium. R172H knock-in mice were 
generated by Tyler Jacks (Massachusetts Institute of Technology)’ and obtained 
from the NCI Mouse Repository. Primary MEFs from 13.5-day embryos were 
generated as previously described”, and cultured in standard tissue culture med- 
ium in a 37 °C incubator at 3% oxygen condition. 

Western blot and antibodies. Cells were lysed in modified RIPA buffer contain- 
ing 150 mM NaCl, 1% NP-40, 50 mM Tris-Cl, pH 8.0, and 1% SDS, supplemented 
with protease inhibitors (Life Technologies, number 78446) before use. Protein 
concentration was determined by BCA protein assay (Life Technologies, number 
23227), following which equal amount of proteins were loaded and separated in 
polyacrylamide gels. Proteins were then transferred to nitrocellulose membrane. 
Antibodies used in this study were as follows: p53 monoclonal antibody DO-1 
(Calbiochem EMD); p53 polyclonal antibody FL393 (Santa Cruz Biotechnology, 
sc-6243). Flag (Sigma, M2, F1804), HA (Rockland, 600-401-384), histone H3 
(abcam, ab1791), H3K4mel1 (abcam, ab8895), H3K4me2 (Active Motif, 39142), 
H3K4me3 (abcam, ab8580), H3K9ac (Active Motif, 39137), H3K14ac (Active 
Motif, 39616), H3K27ac (abcam, ab4729), H3K36me3 (abcam, ab9050), ETS2 
(Santa Cruz Biotechnology, sc-351), MLL1 (Bethyl Laboratories, A300-086A), 
MOZ (Novus Biologicals, 21620002), mouse p53 antibody for ChIP experiments 
(Santa Cruz Biotechnology, sc-1312 (M-19)), mouse p53 antibody for western 
blot analysis (Cell Signaling Technology, number 2524), RNA polymerase II 
(abcam, ab817). 

Co-immunoprecipitation. Flag tagged ETS2 protein was transfected (Life 
Technologies, number 11668019) and expressed in HEK293T cells and then 
subjected to immunoprecipitation with Flag antibody conjugated protein G 
Dynabeads (Life Technologies, number 10004D). Following stringent washes, 
HA-tagged wild-type p53 or GOF p53 (generated by in vitro translation 
(Thermo, number 88881)) was added to co-immunoprecipitate with Flag- 
ETS2 in buffer containing: 20 mM Tris, pH 8.0, 137mM NaCl, 1mM MgCh, 
1mM CaCl, 1% NP-40, and protease inhibitors. Endogenous co-immunopreci- 
pitation experiments were performed in buffer containing: 20 mM Tris, pH 8.0, 
137 mM NaCl, 1 mM MgCh, 1 mM CaCh, 1% NP-40, 10% glycerol, with protease 
and phosphatase inhibitors, and 12.5 U ml! benzonase (Novagen, 70746). 
Bacterial expression and GST pulldown. GST-tagged ETS2 constructs were 
transformed and expressed in BL21-CodonPlus E. coli. Bacterial lysates were 
incubated with glutathione beads (Life Technologies, number G2879) at 4°C 
for 2h, and washed four times with buffer containing 50mM Tris, pH 7.5, 
150mM NaCl, 1% Triton, 1mM DTT, supplemented with 100 1M PMSF. The 
in vitro translated (Thermo, number 88881) HA-tagged wild-type p53 or GOF 
p53 proteins were pre-cleared with GST at 4°C for 1h and the resulting super- 
natant was subjected to GST pulldown with GST or GST-ETS2. The product was 
then washed and subjected to western blot analysis. 

RT-qPCR, ChIP-qPCR and ChIP-sequencing. RNA was isolated from cells 
using RNeasy kit (Qiagen, number 74106). RNA was then reverse transcribed to 
cDNA (Life Technologies, number 4387406), then qPCR was performed for quan- 
tification using standard procedures on a 7900HT Fast-Real-Time PCR platform 
(ABI). ChIP was performed as previously described”, with modifications. In brief, 
cells were crosslinked in 1% formaldehyde (Thermo, number 28906) in PBS for 
10 min at room temperature. After glycine quenching, cell pellets were collected 
and lysed as previously described*®, and then subjected to sonication with the 
Covaris sonicator (S220). The supernatant was then diluted in the same sonication 
buffer but without N-lauroylsarcosine, and subjected to immunoprecipitation with 
corresponding antibodies at 4°C overnight. The beads were then washed and DNA 
was reverse-crosslinked and purified. Following ChIP, DNA was quantified by 
qPCR using standard procedures on a 7900HT Fast-Real-Time PCR platform 
(ABI), or sequencing libraries were prepared using NEBNext Ultra library prepara- 
tion procedure, and then sequenced on Illumina Hi-Seq platform at the Next- 
Generation Sequence Core at University of Pennsylvania, or on Illumina Next- 
Seq platform in the Epigenetics Program at the University of Pennsylvania. All 
primer sequences used in this study are available in Supplementary Table 3. 
Growth curve measurement. 200,000 cells were seeded on 950 mm/ surface area 
(one well of 6-well plate) on day 0. Cell number was measured every two days with 


Countess automated cell counter (Life Technologies) following standard proce- 
dures and default parameter settings, after which 200,000 cells were plated back 
for the next count. For shRNA-mediated knockdown experiments, cells were 
seeded 7 days after the initial infection of shRNA-containing lentivirus, during 
which puromycin selection was completed and cells were returned to normal 
growth medium. For small compound inhibitor treatment experiments, inhibi- 
tors or DMSO vehicle control were added on day 0 as cells were seeded, and 
refreshed every other day as cells were counted and replated. All short hairpin 
sequences used in this study are available in Supplementary Table 3. 
Colony-formation assay. After lentiviral infection of shRNA constructs and 
puromycin selection, 2,000 cells were seeded per well in 6-well plates. After three 
weeks, cell colonies were fixed with 1% paraformaldehyde and stained with 0.1% 
crystal violet (for 15 min). For quantification, the crystal violet dye was released 
into 10% acetic acid and measured at Asognm (OD590). 

Soft agar anchorage-independent growth assay. The base layer of soft agar 
contained complete DMEM media (10% FBS, 100 units per ml penicillin and 
100 pg per ml streptomycin) with 1% agar; the top layer of soft agar contained 
complete DMEM media with 0.7% agarose and was mixed with 5,000 cells and 
plated over the base layer. Colonies were fixed and stained with 0.005% crystal 
violet (for 1h), and visible colonies were counted. 

Tumour xenograft assay. A total of four male and four female mice (Mus mus- 
culus, strain NOD.Cg-Prkde*““ Trg’) SzJ, Jackson Labs (stock number 
005557)) between the ages of 38 and 45 days old were used per treatment for 
tumour xenograft experiments. All animal experiments described adhere to policies 
and practices approved by the University of Pennsylvania Institutional Biosafety 
Committee (IBC) and the Institutional Animal Care and Use Committe (IACUC). 
Cells were collected after shRNA (MLL1 or non-targeting control) mediated 
knockdowns. Then 1.5 million cells were injected subcutaneously per mouse. 
Tumour size was measured by calipers 10 weeks after subcutaneous injection. 
Tumour size was measured in two dimensions, and tumour volume was calculated 
as 0.5 X length X width’. All mice were euthanized 20 weeks after subcutaneous 
injection. Tumours were then excised and photographed. 

ChIP-sequencing and RNA-sequencing analysis. Human cell sequencing reads 
were aligned to human genome hg18 using Bowtie2 (ref. 41). For p53 ChIP-seq, 
significant regions of enrichment (peaks) were called using HOMER (Salk 
Institute, http://homer.salk.edu). For area under the curve analysis, ChIP-seq tags 
from each cell line were counted at TSS proximal peaks (200 bp around peak 
centres) of every cell line (including itself) as indicated. Heat maps of p53 enrich- 
ment across a 5 kb region (2.5 kb from peak centre, bin = 10) in MCF7, MDA- 
MB-175VII, MDA-MB-468, HCC70, BT-549 cell lines were generated using 
HOMER and visualized using JavaTreeView. Sequencing reads from MEFs 
ChIP-seq experiments were aligned to the mouse reference genome mm9 using 
Bowtie2. Strand-specific mouse RNA-seq experiments were aligned to the mm9 
reference genome and reference transcriptome. FPKM expression values were 
counted for each exon and merged into a single gene model using HOMER. 
Motif analysis. To determine associated sequence motifs for wild-type p53 or 
GOF p53 peaks, all TSS proximal peaks (filtered to remove peaks overlapping 
with satellite DNA) were pared down to the central 50 bp and used as input to 
MEME and the SeqPos utility in Cistrome (central 100 bp as required by SeqPos). 
MEME was instructed to search for the top 10 motifs appearing 0 or more times in 
each sequence, and SeqPos was run with default parameters. 

Gene ontology analysis. GO terms associated with wild-type p53 or GOF p53 
binding sites were determined in the following way. ChIP-seq TSS proximal peaks 
were associated with the nearest ENSEMBL transcript and processed using DAVID. 
The FDR was controlled at 1% and GO terms with fewer than 5 associated transcripts 
or a fold-enrichment over the genomic background under fivefold were discarded. 
Intersection with ENCODE transcription factor data sets. Transcription factor 
peak coordinates (hg18 assembly) were obtained from the ENCODE project repos- 
itory (http://www.encodeproject.org) in BED format. TSS proximal p53 ChIP-seq 
peak regions were intersected with all transcription factor binding-site data using 
BEDTools, with overlap inferred if a minimum ofa single base pair was in common. 
TCGA analysis. Exome sequencing and RNA sequencing data sets were obtained 
from TCGA (https://tcga-data.nci.nih.gov/tcga/). Based on p53 mutational status 
from the exome sequencing data sets, we grouped tumour samples into: (1) wild 
type (tumours without detectable p53 mutation); (2) GOF (tumours with p53 
single missense mutation of R175H, R248Q, R248W, R249S or R273H); and (3) 
null (tumours with p53 nonsense mutations or frameshift truncations). Tumours 
with other types of p53 mutations (other missense mutations, inframe insertion/ 
deletion, or splicing mutations) were not included in further analysis, due to an 
unpredictable effect on the downstream chromatin regulators. Cancer types that 
include more than 5% samples in group 2 were included for the combined 
analysis, in which RNA expression values were normalized to the wild-type group 
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median. For individual cancer type analysis, original RNA expression values 
(normalized read counts or RPKM values) from TCGA data sets were used. 
OICR-9429. OCIR-9429 was developed using structure-guided medicinal chem- 
istry and peptide displacement assays starting from “Compound 3’ previously 
reported in ref. 42, as part of the Chemical Probe Program of the Structural 
Genomics Consortium. OICR-9429 is highly specific for WDR5 and was 
shown to have >100-fold selectivity over 300 other chromatin ‘reader’ domains, 
methyl-transferases, and other non-epigenetic targets. The details of its structure, 
discovery and characterization are described in ref. 38. 
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Extended Data Figure 1 | Distinct GOF p53 mutants have similar genome- _ ChIP-seq. b-e, Area under the curve, meta-peak analysis showing GOF 
wide binding patterns, but are different from that of wild-type p53. a, Heat — p53(R248W) or IgG ChIP-seq signal enrichment from MDAH087 cells over 
maps showing the enrichment of p53 peaks (+2,500 bp around peak centre) | TSS-proximal peaks identified in MDA-MB-468 (b), HCC70 (c), 

identified from each cell line (rows) in all five cell lines (columns) examined by © MCE7 (d) and MDA-MB-175VII (e) cells. 
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Extended Data Figure 2 | GOF p53 genome-wide binding is in association 
with ETS family proteins. a, Canonical ETS binding motif (top), and 
discovered motif from all TSS-proximal peaks in MDA-MB-468 predicted by 
MEME/TomTom (middle), or SeqPos (bottom). b, MEME/TomTom 
identified wild-type p53 motif from MDA-MB-175VII TSS-proximal peaks. 
c, GST pulldown of bacterially expressed GST or GST-ETS2 with in vitro 
translated wild-type p53 or p53(R175H). d, e, Co-immunoprecipitation at 
endogenous protein levels of ETS2 and GOF p53(R273H) (d) or wild-type p53 
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(e) in MDA-MB-468 (d) or MCE7 (e) cells. f, g, Box plots showing overlap of 
GOF p53 (f) TSS-proximal peaks from MDA-MB-468 cells or wild-type p53 
(g) TSS-proximal peaks from MCE7 cells, with ETS family proteins (blue), 

all other transcription factors (grey) or Pol II (white) peaks from ENCODE 
ChIP-seq data sets. Whiskers on the box plots represent the inter-quartile 
range. Mann-Whitney U-tests were performed to compute significance. h, GO 
analysis of wild-type p53 TSS-proximal peaks (statistics are shown in 
Supplementary Table 1). 
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Extended Data Figure 3 | UCSC Genome Browser views showing distinct e-g, Re-aligned GOF p53(R248W) and IgG ChIP-seq data from LFS 
wild-type p53 and GOF p53 binding patterns over representative canonical © MDAH087 cells, showing enrichment of GOF p53 at promoter regions of 


wild-type p53 targets and novel GOF p53 targets. a-d, UCSC Genome MLLI (e), MLL2 (f), and MOZ (g). h-j, UCSC Genome Browser views of p53 
Browser views of p53 occupancy over promoter regions of MLL1 (a), MLL2 occupancy over promoter regions of RBBP5 (h), MDM2 (i) and PUMA (j), in 
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Extended Data Figure 4 | ChIP-qPCR validation of GOF p53 binding at (d), HCC70 (e) and MDA-MB-468 (f) cells. g, h, ChIP-qPCR showing p53 
newly identified chromatin regulator genes. a, Schematic of amplicon (FL393 antibody) or IgG (rabbit) enrichment over MDM2, CDKNIA, MLL1, 
locations for ChIP-qPCR validations performed in this study. b, c, ChIP-qPCR = MLL2and MOZ regions, in MDA-MB-468 (g) and MDA-MB-175VI] (h) cells. 
showing p53 (DO-1 antibody) or IgG (mouse) enrichment (ChIP/input) over i. ChIP—qPCR showing p53 (DO-1 antibody) or IgG (mouse) enrichment 
MLL1, MLL2 and MOZ peak regions, in BT-549 (b) and HCC70 (c) cells. over MLL1, MLL2 and MOZ peak regions in PANC-1 cells. Error bars represent 
d-f, ChIP-qPCR showing p53 (DO-1 antibody) or IgG (mouse) enrichment mean = s.e.m.; n = 3; two-tailed Student’s t-test: *P < 0.05; **P < 0.01; 

over OGT, PPP1CC, RBBP5, SMARCD2, and DCAF10 peak regions in BT-549 = ***P<0.001. 
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Extended Data Figure 5 | GOF p53 regulates expression of MLL1, MLL2, 
and MOZ, and corresponding histone post-translational modifications in 
cancer cells. a, b, RT-qPCR analysis measuring mRNA level changes upon 
siRNA-mediated GOF p53 knockdown in MDA-MB-468 cells (a), and shRNA- 
mediated wild-type p53 knockdown in MDA-MB-175VII cells (b). ¢, d, RT- 
qPCR analysis of mRNA levels (c), and western blot analysis of protein levels 
upon DMSO or nutlin treatment in MCEF7 cells (d). e, Western blot analysis of 
MLLI protein level upon shRNA-mediated wild-type p53 knockdown in 
MDA-MB-175VII cells. f, Western blot analysis of MOZ protein level change 
upon shRNA-mediated GOF p53 knockdown in MDA-MB-468 cells. g, RT- 
qPCR measuring mRNA levels changes upon shRNA-mediated ETS2 
knockdown in MDA-MB-468 cells. h, i, RT-qPCR measuring mRNA levels 
(h) and western blot measuring protein levels (i) upon shRNA-mediated ETS2 


knockdown in BT-549 cells. j, k, RT-qPCR measuring mRNA levels changes 
upon shRNA-mediated ETS1 knockdown in BT-549 (j) and MDA-MB-468 
(k) cells. Numbers 89 and 91 denote two short hairpins targeting ETS1, 
sequences of which are shown in Supplementary Table 3.1, m, ChIP-~qPCR 
showing p53 occupancy (1) and Pol II occupancy (m) upon shRNA-mediated 
ETS1 knockdown in MDA-MB-468 cells. n, 0, Western blot analysis of histone 
methylation and acetylation level changes upon siRNA-mediated (n) or 
shRNA-mediated (0) knockdown of GOF p53 in MDA-MB-468 cells. 

p, Western blot analysis of histone methylation and acetylation level changes 
upon GOF p53 knockdown in PANC-1 cells. q, Western blot of H3K9ac change 
upon MOZ knockdown in MDA-MB-468 cells. Uncropped blots are shown 
in Supplementary Fig. 1. Error bars represent mean + s.e.m.; n = 3; two-tailed 
Student’s t-test; *P < 0.05; **P < 0.01; ***P < 0.001. 
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Extended Data Figure 6 | GOF p53 regulates expression of MIl1, MIl2, 
and Moz, and corresponding histone post-translational modifications in 
primary MEFs. a, RT-qPCR analysis comparing MIl1 expression levels 
between MEFs bearing wild-type p53, GOF p53(R172H), and p53 null. 

b, Western blot comparing MIl1 protein level between MEFs with wild-type 
p53 and GOF p53. ¢, RT-qPCR analysis comparing MII2 and Moz expression 
levels between MEFs bearing wild-type p53, GOF p53(R172H), and p53 null. 
d, RT-qPCR measuring mRNA changes upon shRNA-mediated p53 
knockdown in GOF p53(R172H) knock-in MEFs. e, f, RT-qPCR analysis of 
mRNA levels (e) and western blot of protein levels (f) upon retroviral 
expression of GOF p53(R172H) in MEFs with p53 knockout. g, Western blot 
comparing H3K4me3 and H3K9ac levels between MEFs with wild-type p53 
and GOF p53(R172H). h, Western blot showing H3K4me3 and H3K9ac 
levels upon p53 knockdown in wild-type p53 MEFs. i, Growth curve analysis of 
wild-type p53 MEF proliferation upon shRNA-mediated p53 knockdown. 

j, k, Box plot analysis of RNA levels (left) and H3 normalized H3K4me3 levels 


(right) at previously discovered Mill target genes (j) or Hoxa cluster genes 
(k) compared with all genes, from RNA-seq and H3K4me3 ChIP-seq in MEFs 
with wild-type p53 or GOF p53 R172H. Plots are presented as ratios of GOF 
p53(R172H) values over wild-type p53 values. 1, UCSC Genome Browser 
views of H3K4me3 enrichment (top) and RNA levels (bottom) of Cdkn1a, from 
H3K4me3 ChIP-seq and RNA-seq of MEFs with wild-type p53 or GOF 
p53(R172H). Tracks are presented as overlay of wild-type p53 and GOF p53 
signals. Blue denotes more enriched in wild-type p53, red denotes more 
enriched in GOF p53(R172H), black denotes overlap. m, Box plot of H3 
normalized H3K4me3 levels over all gene TSSs, from H3K4me3 ChIP-seq in 
MEFs with wild-type p53 or GOF p53(R172H). n, RT-qPCR analysis 
comparing Hox gene expression levels between MEFs bearing wild-type p53, 
GOF p53(R172H), and p53 null. Uncropped blots are shown in Supplementary 
Fig. 1. For all bar graphs, two-tailed Student’s t-test; *P <0.05; **P < 0.01; 
***P < 0.001. Error bars represent mean + s.e.m.; n = 3. For all box plots, 
Mann-Whitney U-test; *P < 0.05; **P < 0.01; ***P < 0.001. 
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Extended Data Figure 7 | MLL knockdown reduces proliferation and cancer 
phenotype specifically in GOF p53 cancer cells. a, b, Growth curve analysis of 
MDA-MB-468 (a) and MDA-MB-175VII (b) cells with either non-targeting 
control shRNA or p53 shRNA knockdown. ¢, d, Growth curve analysis of 
MDA-MB-468 (c) and MDA-MB-175VII (d) cells with non-targeting control 
shRNA, MLL1 shRNA, or MLL2 shRNA knockdown. e, Growth curve 
analysis of MCF7 cells with non-targeting control shRNA or MLL1 shRNA 
knockdown. f, g, Colony-formation assay of MDA-MB-468 (f) and MCF7 

(g) cells with either non-targeting control shRNA or MLL1 shRNA 
knockdown. Corresponding to Fig. 4a, b. h, i, Colony-formation assay of 


BT-549 (h) and PANC-1 (i) cells with either non-targeting control shRNA or 
two different MLL1 shRNA knockdown, and quantification by crystal violet 
staining over three biological replicates. Reduction of MLL1 protein is also 
shown by western blot. j, k, Anchorage-independent soft agar assay of MDA- 
MB-468 (j) and MCE7 (k) cells with either non-targeting control shRNA or 
MLL1 shRNA knockdown. Dashed boxes denote enlarged images of the 
selected areas. White arrows indicate visible colonies in j. Quantifications are 
shown as number of visible colonies. Error bars represent mean + s.e.m.; n = 3; 
two-tailed Student’s t-test; **P < 0.01; ***P < 0.001. 
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Extended Data Figure 8 | MLL knockdown reduces proliferation 
specifically of GOF p53 MEFs and LFS cells. a, Growth curve analysis of GOF 
p53(R172H) MEFs with either non-targeting control shRNA or two different 
MIll shRNA knockdowns. b, Western blot analysis of MLLI1 levels upon 
shRNA-mediated knockdown in LFS MDAH087 and MDAH041 cells. 

c, Western blot analysis of p53 protein levels in LES MDAH087 and MDAH041 
cells. d, e, Growth curve analysis of LFS MDAH087 cells upon MLL1 

(d) knockdown or p53 (e) knockdown. f, Growth curve analysis of LFS 


Day Day 


MDAH041 cells upon MLL1 knockdown. g, h, Western blot analysis of MLL1 
level (g) and growth curve analysis (h) of proliferation upon shRNA-mediated 
MLL1 knockdown in IMR90 cells. i, Growth curve analysis of LFS 
MDAH087 cells with non-targeting control shRNA plus empty vector, p53 
shRNA plus vector, and p53 shRNA plus MLL1 expressing vector. j, k, Growth 
curve analysis of LFS MDAH087 (j) and LFS MDAH041 (k) cells with 

either non-targeting control shRNA or MLL2 shRNA knockdown. 
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Extended Data Figure 9 | TCGA RNA expression profile analysis. carcinoma (e) or pancreatic adenocarcinoma tumours (f) with wild-type 
a-f, TCGA RNA expression profile of GOF p53 target genes (top), p53 (blue), GOF p53 (orange), or p53 null (white). Expression values are 
housekeeping genes (middle), and wild-type p53 target genes (bottom) in normalized read counts (a-d, f), or RPKM values (e) from TCGA RNA-seq 
brain lower grade glioma (a), head and neck squamous cell carcinoma data sets. Mann-WhitneyU-tests were performed to compute significance. 


(b), bladder urothelial carcinoma (c), colon adenocarcinoma (d), oesophageal 
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An atomic structure of human 


y-secretase 


Xiao-chen Bai!*, Chuangye Yan2*, Guanghui Yang**, Peilong Lu’, Dan Ma’, Linfeng Sun’, Rui Zhou’, Sjors H. W. Scheres! & 


Vigong Shi? 


Dysfunction of the intramembrane protease y-secretase is thought to cause Alzheimer’s disease, with most mutations 
derived from Alzheimer’s disease mapping to the catalytic subunit presenilin 1 (PS1). Here we report an atomic structure 
of human y-secretase at 3.4 A resolution, determined by single-particle cryo-electron microscopy. Mutations derived 
from Alzheimer’s disease affect residues at two hotspots in PSI, each located at the centre of a distinct four 
transmembrane segment (TM) bundle. TM2 and, to a lesser extent, TM6 exhibit considerable flexibility, yielding a 
plastic active site and adaptable surrounding elements. The active site of PSI is accessible from the convex side of the 
TM horseshoe, suggesting considerable conformational changes in nicastrin extracellular domain after substrate 
recruitment. Component protein APH-1 serves as a scaffold, anchoring the lone transmembrane helix from nicastrin 
and supporting the flexible conformation of PS1. Ordered phospholipids stabilize the complex inside the membrane. Our 
structure serves as a molecular basis for mechanistic understanding of y-secretase function. 


A hallmark of Alzheimer’s disease is accumulation of B-amyloid pla- 
que in the brains of patients’. The intramembrane protease y-secre- 
tase is thought to contribute to the development of Alzheimer’s 
disease by generating B-amyloid peptides (ABs), particularly those 
that are prone to aggregation such as AB42 (refs 2-5). A mature 
-secretase consists of four components: presenilin, PEN-2, nicastrin, 
and APH-1 (ref. 6). Among these components, presenilin is respons- 
ible for the AB-producing proteolytic activity’*. 

Presenilin comprises nine TMs, with the signature motifs YD on 
TM6 and GxGD on TM7 (ref. 8). During assembly of y-secretase, 
presenilin undergoes an autocatalytic cleavage to produce an amino 
(N)-terminal fragment (NTF, comprising TMs 1-6) and a carboxy 
(C)-terminal fragment (CTF, comprising TMs 7-9)’. Among the 300 
or so mutations derived from patients with familial Alzheimer’s dis- 
ease (FAD), more than two-thirds are mapped to PS1, and about three 
dozen each are derived from PS2 and amyloid precursor protein 
(APP). Nicastrin contains a large extracellular domain (ECD) and a 
single TM; its ECD is heavily glycosylated and thought to recognize 
the N terminus of substrate protein'’'*. PEN-2 directly binds PS1 and 
is required for its autocatalytic maturation and protease activity'*”. 
APH-1 contains seven TMs and is indispensable for y-secretase 
assembly'*””. 

Our previous cryo-electron microscopy (cryo-EM) structure of 
human y-secretase at 4.5 A resolution led to identification of 19 TMs 
and construction of a partial atomic model for the ECD'’. Analysis of 
the crystal structures of the archaeal presenilin homologue PSH” and 
the nicastrin homologue from Dictyostelium purpureum (DpNCT)* 
yielded a tantalizing clue on TM assignment and an improved atomic 
model for the ECD. A subsequent cryo-EM structure of human y-secre- 
tase at 4.3 A resolution allowed assignment of all 20 TMs to the four 
components". In this study, we report the first atomic structure of an 
intact human y-secretase, which allows visualization of the atomic 
structures of all four components and determination of the specific 
interactions that underlie y-secretase assembly. 


Structure determination of human y-secretase 

We performed cryo-EM single-particle analysis on the same sample 
in amphipols that was used to calculate our previous map’*. With the 
aim of reaching higher resolution, we used zero-loss energy-filtered 
imaging, a higher magnification, and a lower dose rate on the single- 
electron counting detector (see Methods). We also collected a larger 
data set. An initial set of 412,272 particle images yielded a 3.5 A map. 
Subsequent three-dimensional (3D) classification that was focused on 
the TMs resulted in a more structurally homogeneous subset of 
159,549 particles. This subset was used to calculate the final map at 
3.4A resolution (Extended Data Figs 1 and 2). Our map displays 
excellent main-chain connectivity and side-chain densities for almost 
all residues (Extended Data Fig. 2a). 

Twenty TMs were identified in the transmembrane region, includ- 
ing a highly mobile TM (TM2 of PS1) that is only visible as rod- 
shaped density in a 7A low-pass filtered map (Extended Data 
Fig. 2b). This low-pass filtered map also shows a second rod-shaped 
density in the PS1 cavity between TM3, TMS5, and TM2, which we 
were unable to identify. As previously reported*”’, PS1 and APH-1 
are located at the centre of the TM horseshoe. All seven TMs of APH- 
1 have exceptional density, with most side chains clearly identifiable 
(Extended Data Fig. 2c). Except for TM2 and TM6, all other TMs of 
PS1 exhibit excellent density, with aromatic and bulky residues easily 
recognizable (Extended Data Fig. 2d). The three TMs of PEN-2 on the 
thin end of the TM horseshoe and the lone TM of nicastrin on the 
thick end both display discernible side-chain features (Extended Data 
Fig. 2e, f). Several N-linked glycans and lipid molecules are also 
defined by clear EM density (Extended Data Fig. 2g, h). 

On the basis of these unambiguous EM densities, we built and 
refined a near-complete atomic model for human y-secretase (Fig. 1 
and Extended Data Table 1), which includes 598 residues in the TMs 
and 632 residues in the ECD. Assignment of specific residues in the 
TMs was aided by a large number of aromatic amino acids (Extended 
Data Fig. 2c-f). The density map for PS1-TM2 is inadequate for 
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PS1 


TM6 


Figure 1 | Atomic structure of human y-secretase. a, The -secretase 
structure is shown in cartoon representation (left) and surface view (right). 
Eleven N-linked glycans are displayed in stick. b, The y-secretase structure is 
viewed perpendicular to the lipid membrane from the intracellular side. TM2 of 
PS1 is most flexible and shown in a semi-transparent fashion. The catalytic 
residues Asp257 and Asp385 are located on the convex side of the TM horseshoe. 
All structural figures were prepared using UCSF Chimera** or PyMol’*. 


Extracellular 
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model building; nonetheless, we generated an atomic model for this 
TM on the basis of its sequence homology to the archaeal homologue 
PSH”. This is the first structure of human y-secretase at a near-atomic 
resolution (Fig. 1 and Extended Data Fig. 3). 

Nicastrin ECD, which constitutes the bulk of the extracellular 
region of human y-secretase, contains 11 glycans and directly inter- 
acts with two ends of the TM horseshoe (Fig. 1a). The lone TM of 
nicastrin closely stacks against TM1/TM5/TM7 of APH-1 through 
predominantly hydrophobic interactions at the thick end. TM2 and 
TM4 of APH-1 associate with TM8 and TM9 of PS1 at the centre of 
the TM horseshoe, with the C-terminal three residues Phe465- 
Tyr466-Ile467 of PS1 inserting into a greasy central cavity on the 
extracellular side of APH-1 (Fig. 1a). The three TMs of PEN-2 bind 
to TM4 of PS1 at the thin end. Notably, this arrangement places the 
two catalytic residues, Asp257 and Asp385 of PS1, on the convex side 
of the TM horseshoe (Fig. 1b). 


Atomic structure of presenilin 


PS1 exhibits an extended organization, with empty spaces between a 
few adjacent TMs (Fig. 2a and Extended Data Fig. 4). The N-terminal 
77 residues have no EM density and are presumably disordered, 
probably because of their intrinsic flexibility. Among the seven surface 
loops that connect neighbouring TMs, four exhibit clear and contigu- 
ous density. The extended sequences between TM6 and TM7 harbour 
the site of autocatalytic cleavage; but these sequences are hydrophilic 
and mostly disordered (Fig. 2b). The nine TMs exhibit a large vari- 
ation in length, with TM9 containing 30 residues and TM7 only 18. 

Among the two catalytic residues, Asp257 is located in the middle 
of TM6 slightly to the extracellular side, whereas Asp385 maps to the 
cytoplasmic side of TM7 (Fig. 2b). The distance between the Ca atoms 
of Asp257 and Asp385 measures approximately 10.6 A (Fig. 2c), 
considerably longer than that in an activated aspartate protease such 
as pepsin’*. Importantly, however, these catalytic residues are placed 


Figure 2 | Atomic structure of PS1. a, PS1 has a 
loosely organized structure and exhibits 
considerable flexibility. The cartoon representation 
of PS1 is rainbow-coloured. TM2 is visible only at 
low resolutions, and the density map contains no 
features for side chains. Nonetheless, an atomic 
model for TM2 was built on the basis of sequence 
and structural homology between PS1 and PSH’? 
b, A membrane topology diagram of PS1. The two 
catalytic aspartate residues are coloured red. c, The 
two catalytic aspartate residues of PS1 are in near- 
perfect registry with those in PSH’. The PAL 
sequence motif implicated in substrate recognition 
is shown. d, PS1 and PSH share similar features at 
their active sites. 
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next to the PAL signature motif on TM9 that is thought to play a role 
in substrate recognition’**° (Fig. 2c). We speculate that substrate 
binding may trigger alignment of these two aspartate residues and 
consequent catalysis. 

Despite a relatively low sequence identity of 19% between PS1 and 
the archaeal intramembrane protease PSH, their overall structures are 
similar to each other’’”’. In particular, the catalytic residues are in 
nearly perfect registry between PS1 and PSH (Fig. 2c). The amino 
acids that surround the catalytic residues, including the PAL motif, 
are also highly conserved. Relative to Asp257, three residues of PS1 
(1le253, Tyr256, and Val261) are located on the same side of TM6; 
these residues are replaced by Leu158, Tyr161, and Vall66, respect- 
ively, in PSH (Fig. 2d). Similarly, Gly382 and Phe388 of PS1 corre- 
spond to Gly219 and Met223 of PSH, respectively. Ile439 on TM9 of 
PS1 and Ile283 of PSH nearly coincide with each other (Fig. 2d). The 
observed structural conservation may underlie the finding that PSH 
exhibits similar cleavage preferences towards APP C99 as human 
y-secretase”®. 


Mutational hotspots in PSI structure 


The PS1 structure at a near-atomic resolution allows mapping 
and analysis of disease-derived mutations in PS1. On the basis of 
public information (http://www.alzforum.org/mutations), the 212 
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Figure 3 | Alzheimer’s disease-derived mutations map to two hotspots in 
PS1. a, An overall view of the PS1 residues targeted for mutations in patients 
with Alzheimer’s disease. PS1 is viewed from the extracellular side. Mutated 
residues are coloured orange. b, Close-up views of the mutation-targeted 
residues in TMs 2-5. Most of these residues map to the centre of this four-TM 
bundle. c, Close-up views of the mutation-targeted residues in TMs 6-9. The 
two catalytic residues Asp257 and Asp385 are shown. d, FAD-derived 
mutations in PS1 have varying degrees of effect on the combined AB40 and 
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Alzheimer’s-disease-derived missense mutations in PS1 affect 135 
amino acids, of which 53 are targeted for two or more mutations. 
Presumably these residues affect the specific function of PS1 that is 
directly related to development of Alzheimer’s disease. Among 
these 53 residues, 35 are identified in the TMs of our current 
PS1 structure, and the rest are located on surface elements that have 
no EM density. Notably, there are eight such residues in TMS, seven 
in TM6, but none in TM1. Together, these 35 amino acids account for 
a total of 101 Alzheimer’s-disease-derived missense mutations. 

Analysis of these 35 residues on the structure of PS1 led to iden- 
tification of two mutational hotspots, each located at the inner core of 
a structural repeat (Fig. 3a). The first hotspot involves the inner core of 
TMs 2-5 (Fig. 3b). Among the 20 affected residues in these TMs, 18 
have their side chains facing the inner core of TMs 2-5. Consequently, 
only one side of each TM helix is affected. For example, Leu219, 
Glu222, Leu226, Ser230, Met233, and Phe237 are placed on the same 
side of TM5 (Fig. 3b). Similarly, the second hotspot is located at the 
inner core of TMs 6-9, in the vicinity of the catalytic residues Asp257 
and Asp385 (Fig. 3c). 

To examine the functional consequence of disease-derived muta- 
tions, we generated ten y-secretase variants, each containing a distinct 
missense mutation in PS1. Among these mutations, seven map to the 
two mutational hotspots. These y-secretase variants were individually 
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A42 cleavage activity of y-secretase. Shown here are results of ten such 
y-secretase mutants, each containing a specific mutation derived from FAD. 
The activity of WT y-secretase is normalized to 1.0. e, FAD-derived mutations 
either suppressed the production of AB40 more than A42 or increased the 
production of AB40 less than AB42. f, All but two FAD-derived mutations led 
to increased AB42/AB40 ratios. Two mutations F237I and V261F in PS1 
abrogated Af40 cleavage altogether, disallowing calculation of the AB42/AB40 
ratio. Each experiment was independently repeated three times. Error bars, s.d. 
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purified to homogeneity and examined for their protease activities 
towards the APP C99 substrate in an in vitro cleavage assay. 
Compared with the wild-type (WT) y-secretase, four mutations 
(1202F, F237I, L248R, and V261F) lead to severely compromised 
protease activity (Fig. 3d). Of these four mutations, F237I and 
V261F failed to generate any detectable amount of AB40 (Fig. 3e), 
disallowing calculation of the AB42/A40 ratio (Fig. 3f). In contrast, 
three mutations (1143V, F177L, and M233L) had relatively little 
impact on the total cleavage activity of y-secretase (Fig. 3d), whereas 
the remaining three mutations (1213L, L226F, and L424V) actually 
increased the total cleavage activity (Fig. 3d). These observations 
strongly suggest a disconnection between the total protease activity 
of y-secretase and the development of FAD (where these mutations 
were isolated). All eight mutations for which the AB42/AB40 ratio can 
be calculated led to increased AB42/AB40 ratios compared with WT 
y-secretase (Fig. 3f). The generally increased ratios of AB42 over AB40 
may suggest a causal relationship for development of FAD, but could 
also be explained by other possibilities. 


Atomic structure of nicastrin 


Similar to DpNCT”, human nicastrin also contains a large lobe, a 
small lobe, and a lone TM (Fig. 4a and Extended Data Fig. 5). These 
two nicastrin structures can be aligned to each other with a root mean 
squared deviation of 2.2 A. Of the 230 kDa molecular mass in mature 
y-secretase, up to 70 kDa can be attributed to glycosylation of nicas- 
trin’’”. Among the 16 predicted N-linked glycosylation sites in nicas- 
trin, at least 11 are glycosylated, as judged by the EM density. Because 
the extended glycans are flexible in solution, only a small portion close 
to each Asn residue was modelled (Fig. 4a). 

Nicastrin is thought to rely on Glu333 and Tyr337 for substrate 
recruitment!’-, In the structure of human nicastrin, Glu333 and 
Tyr337 are buried in a hydrophilic pocket that is covered by an 
extended surface loop known as the lid’? (Fig. 4b). The lid, containing 
five aromatic residues, is sandwiched by two prominent glycans on 
Asn55 and Asn435 (Fig. 4b). Several charged and polar amino acids 
are located in the pocket, including four arginine residues: Arg281, 
Arg285, Arg429, and Arg432 (Fig. 4c). With the potential to mediate 
specific interactions such as hydrogen bonds, buried charged and/or 
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Figure 4 | Structural features of nicastrin. a, Two 
perpendicular views of nicastrin. The lid from the 
small lobe is highlighted in red. Surface glycans are 
shown. b, The lid hovers above a hydrophilic 
pocket in the large lobe. Two large glycans on 
Asn55 and Asn435 sandwich the lid and interact 
with surrounding residues. c, Glu333 and Tyr337 
are surrounded by several charged and polar 
residues in the pocket. These structural features are 
consistent with the pocket being a binding site for 
substrate protein. d, Trp164 from the lid makes van 
der Waals contacts to Pro424, Phe448, and the 
aliphatic side chain of Gln420. e, Phe287 from the 
large lobe may serve as the hydrophobic pivot. 
Phe287 interacts with four hydrophobic residues 
from the small lobe. 


polar residues often serve a functional role. These structural features 
support the notion that this pocket is responsible for substrate recruit- 
ment, with these residues directly contributing to recognition. 

The closed conformation of the lid is sustained by specific interac- 
tions between residues from the lid and residues from surrounding 
structural elements. For example, the indole ring of Trp164 makes 
several van der Waals contacts to the side chains of Pro424 and 
Phe448 and the aliphatic portion of Gln420 (Fig. 4d). Substrate binding 
requires opening of the lid and disruption of these interactions. 
Because the lid comes from the small lobe, a rotation of the large lobe 
around a central pivot—Phe287—is proposed to be both necessary and 
sufficient for lid opening”. Such a rotation around Phe287 would be 
greatly facilitated by hydrophobic interactions, which are more adapt- 
ive to conformational changes than hydrogen bonds. Consistent with 
this analysis, Phe287 is nestled in a greasy pocket formed by Phe103, 
Leul171, Phe176, and Ile180 from the small lobe (Fig. 4e). 


Inter-component interactions 
Assembly of the four components into a functional human y-secre- 
tase involves specific interactions mainly in the membrane-spanning 
region, resulting in the burial of approximately 3,794 A? of otherwise 
membrane-exposed surface areas (Fig. 5a). Binding of the ECD onto 
the TMs of y-secretase involves an additional buried surface area of 
1,320 A*. The intra-membrane interactions comprise predominantly 
van der Waals contacts among hydrophobic amino acids, exemplified 
by the insertion of three residues at the C terminus of PS1 into a greasy 
pocket formed by TM2/3/4/6/7 of APH-1 (Fig. 5b). Phe465 from PS1 
contacts Leu72 and Phe205 in APH-1, whereas Tyr466 from PS1 
stacks against Leul63 and Ile167 in APH-1. The aliphatic side chain 
of Ile467 from PS1 interacts with Leu46, Leu86, and Phe125 at the 
bottom of the cavity in APH-1. The C terminus of PS1 is in close 
proximity to the highly conserved His171 and His197 in APH-1, 
which were found to be important for APH-1 binding to y-secretase”’. 
Interestingly, a peptide comprising the C-terminal eight residues of 
PS1 was shown to inhibit AB production”, presumably by weakening 
or disrupting normal assembly of y-secretase. 

On the thin end of the TM horseshoe, an o-helix and its preceding 
and ensuing loops in nicastrin stack against an extended loop at the 


10 SEPTEMBER 2015 | VOL 525 | NATURE | 215 


©2015 Macmillan Publishers Limited. All rights reserved 


ARTICLE 


Tyr466 Leut6 


~~ 


Val191 


T™M3 
a Phe186 


Figure 5 | Assembly interfaces among the four components of y-secretase in 
the transmembrane region. a, An overall view of the packing interfaces in the 
transmembrane region. The three boxed interfaces are detailed in b-d. b, The 
C-terminal three residues Phe465-Tyr466-Ile467 of PS1 insert into a cavity 
formed by TMs in APH-1. c, Nicastrin interacts with PEN-2 through van der 
Waals contacts on a flat interface. d, PEN-2 binds to PS1 mostly through van 
der Waals contacts. In particular, Phe94 of PEN-2 is nestled in the greasy pocket 
of PS1, formed by Phe179, Leu182, Phe186, Val193, Tyr195, and Val198. e, Two 
phospholipids appear to stabilize the inter-component interfaces in y-secretase. 
One lipid is bound at the interface between PS1 and APH-1, whereas the other 
is intercalated between the lone TM of nicastrin and TMs 1/4/5/7 of APH-1 
(left). The aliphatic tails of the latter phospholipid may interact with several 
hydrophobic residues whereas its phosphate group likely hydrogen bonds to 
Arg115 and Gln116 (right). 


C-terminal end of PEN-2 through multiple van der Waals contacts 
(Fig. 5c). At the nearby interface between PS1 and PEN-2, Phe94 from 
PEN-2 is nestled in a hydrophobic pocket formed by six residues in 
PS1 (Fig. 5d). Hydrophobic residues from TM3 of PEN-2, exemplified 
by Leu71, Ile75, and Phe78, also interact with residues in PS1. 
Notably, although PEN-2 has only three TMs and interacts with both 
PS1 and nicastrin, it actually contains three hydrophobic structural 
cores (Extended Data Fig. 6). 

At 3.4A resolution, at least two phospholipid molecules were 
identified, each with two aliphatic tails linked to a small head 
group (Extended Data Fig. 2h). One lipid binds to the interface 
between PS1 and APH-1, making close contacts with residues in 
TM1/TMB8 of PS1 and TM4 of APH-1 (Fig. 5e). The other lipid 
molecule is intercalated into the interface between APH-1 and the 
lone TM of nicastrin (Fig. 5e, right panel). The aliphatic tails of 
this phospholipid interact with several surrounding hydrophobic 
residues, whereas the phosphate group may directly hydrogen 
bond to the side chains of Argl15 and Gln116 from APH-1. 
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This lipid probably stabilizes the nicastrin-APH-1 interface 
within the membrane. 

To corroborate the observed inter-component interfaces, we gen- 
erated eight mutant y-secretase complexes, each containing two spe- 
cific cysteine mutations on two neighbouring components to facilitate 
formation of designed disulfide bonds. Three such y-secretase 
mutants target the interface between APH-1 and_nicastrin 
(Extended Data Fig. 7a). The distance between the Ca atoms of 
Val147 on APH-1 and Ile40 on nicastrin is 4.1 A, ideal for formation 
of a disulfide bond if these two residues were replaced by cysteine. 
Indeed, in the absence of the reducing agent DIT, APH-1 V147C was 
crosslinked to nicastrin 140C within the mutant y-secretase, resulting 
in a high molecular mass complex on denaturing SDS-polyacryla- 
mide gel electrophoresis (SDS-PAGE) (Extended Data Fig. 7a, lane 
1). This crosslinked complex disappeared in the presence of DTT 
(lane 2) and was absent for the WT y-secretase regardless of DTT 
(lanes 7 and 8). Similarly, two additional y-secretase mutants, one 
containing APH-1 V146C and nicastrin A664C and the other con- 
taining APH-1 A4C and nicastrin L673C, allowed specific crosslink- 
ing only in the absence of DTT (lanes 3-6). The same strategy was 
successfully applied to verify the specific interactions at the interfaces 
between PS1 and PEN-2 (Extended Data Fig. 7b), between PEN-2 and 
nicastrin (Extended Data Fig. 7c), and between APH-1 and PS1 
(Extended Data Fig. 7d). 


Discussion 


In this paper, we report the cryo-EM structure of human y-secretase 
at an overall resolution of 3.4A. The qualitative improvement in 
resolution over earlier studies allowed us to derive an atomic model 
for all four components of y-secretase. Except nicastrin'*”°, such 
atomic models are reported here for the first time. 

The available structural evidence supports the notion that Glu333 
and Tyr337 of nicastrin, both buried in a hydrophilic pocket, may play 
a key role in substrate recruitment’. Under this scenario, displace- 
ment of the lid, caused by a movement of the large lobe relative to the 
small lobe, is required before substrate binding”. This proposal is 
supported by the unique pattern of contacts between the large and 
small lobes and by the conversed interactions around the central pivot 
Phe287 in nicastrin (Fig. 4). In substrate-free y-secretase, the lid is 
positioned right above the concave side of the TM horseshoe whereas 
the active site is located on the convex side, resulting in a relatively 
long distance between the putative binding pocket for the N terminus 
of the substrate and the site of cleavage by PS1 (Extended Data Fig. 8). 
This distance may shorten, however, in response to conformational 
changes within y-secretase. In particular, rotation of the large lobe 
relative to the small lobe, induced by substrate binding, may also re- 
orient the substrate for cleavage, perhaps by aligning the pocket in 
nicastrin and the active site in PS1. Dynamic conformations of 
y-secretase have been observed”, particularly in the ECD region*’. 

TM6 of PS1, which harbours the catalytic residue Asp257, exhibits 
relatively poor EM density, whereas TM2 is visible only at low resolu- 
tions. These structural observations are consistent with the notion 
that binding by inhibitors or modulators may induce pronounced 
conformational changes*®*’. In particular, substrate binding may 
trigger a conformational change, rendering the active site suitable 
for catalysis*’. The plasticity observed in our structure may expedite 
the conformational changes that are needed to bring the two catalytic 
aspartates within hydrogen bonding distance of each other, and may 
play a role in the relaxed substrate specificity of the complex. 
Activation of the active site also depends on the binding of PEN-2, 
which was observed to have an allosteric effect on TM6 (ref. 33). 

The prevailing B-amyloid hypothesis suggests an increased ratio of 
AB42 over AB40 as the major culprit for the accumulation of B-amy- 
loid plaque and consequent development of Alzheimer’s disease””. 
Our finding that FAD-derived mutations have varying degrees of 
effect on the protease activity of y-secretase argues against a causal 
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relationship between the total protease activity of y-secretase and the 
development of FAD. Our finding is consistent with the poor clinical 
performance of y-secretase-inhibiting drugs**. Notably, the generally 
increased ratios of AB42 over AB40 only suggest, but fail to prove, a 
direct causal relationship for the development of FAD. For example, 
the assumption that y-secretase has evolved to optimize production of 
AB40 over all other peptides would naturally lead to the prediction 
that any mutation, disease-causing or not, will increase the AB42/ 
AB40 ratio. 

Intriguingly, the mutations causing Alzheimer’s disease map to two 
hotspots, each located at the inner core of a four-TM bundle (Fig. 3). 
The mutations on TMs 6-9 affect residues in the immediate neigh- 
bourhood of the catalytic residues Asp257 and Asp385 and thus may 
cripple the protease activity of y-secretase. The biochemical data seem 
to support this analysis (Fig. 3d). However, the mutations on TMs 2-5 
defy such rationale: they either abrogate or increase the protease 
activity of y-secretase. The generally inward-facing nature of the 
mutation-targeted residues on TMs 2-5 (Fig. 3b) may suggest a trans- 
port function of some sort, or a binding site for another transmem- 
brane protein. These tantalizing clues await experimental 
examination. 

Our structure represents a milestone for the rapidly emerging tech- 
nique of high-resolution cryo-EM structure determination. Because 
the signal-to-noise ratio in cryo-EM images correlates with the size of 
the particles, determining the relative orientations of individual part- 
icles becomes a limitation for small complexes. With a protein mass of 
~170kDa, y-secretase is much smaller than any other atomic-reso- 
lution cryo-EM structure determined so far, while its lack of sym- 
metry further complicates structure determination. Whereas 
glycosylation adds another 30-70 kDa of mass to the complex, only 
a small proportion of the sugar moieties are ordered and their effect 
on alignment is limited. Although glycosylation may inhibit crystal 
growth, it did not seem to hamper cryo-EM structure determination. 
On the contrary, glycosylation may stabilize the protein, possibly by 
protecting the protein from the denaturing air—water interface of the 
thin cryo-EM sample. 


Online Content Methods, along with any additional Extended Data display items 
and Source Data, are available in the online version of the paper; references unique 
to these sections appear only in the online paper. 
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METHODS 


No statistical methods were used to predetermine sample size. The experiments 
were not randomized. The investigators were not blinded to allocation during 
experiments and outcome assessment. 

Protein preparation. To improve protein production, we co-expressed the four 
components of y-secretase, each in its own pMLink plasmid’*. This strategy 
allowed both convenient manipulation of individual components and optimiza- 
tion of protein expression by altering the relative molar ratios of the four plas- 
mids. Specific mutations in PS1, PEN-2, APH-1, or nicastrin were generated only 
in the corresponding pMLink vector using a standard PCR-based approach. 
PEN-2 has an amino-terminal Flag tag, APH-1 contains a carboxyl-terminal 
haemagglutinin (HA) tag, and nicastrin is tagged with a V5-Hisg sequence at 
the carboxyl terminus. The ten y-secretase variants, each carrying a FAD-derived 
mutation on PS1, and the eight y-secretase complexes, each carrying two intro- 
duced cysteine mutations, were overexpressed similarly as the WT y-secretase’*. 
Purification of both WT and mutant human y-secretases followed published 
protocol’*, 

Electron microscopy. Aliquots of 3 tl of purified y-secretase in amphipols at a 
concentration of ~4|1M were placed on glow-discharged holey carbon grids 
(Quantifoil Au R1.2/1.3, 300 mesh), and flash frozen in liquid ethane using an 
FEI Vitrobot. Zero-energy-loss images were recorded manually on an FEI Titan 
Krios electron microscope at 300kV, using a slit width of 20eV on a GIF- 
Quantum energy filter. A Gatan K2-Summit detector was used in super-resolu- 
tion counting mode at a calibrated magnification of < 35,714 (yielding a pixel size 
of 1.4A), and a dose rate of ~2.5 electrons per square angstrém per second 
(approximately 5 electrons per pixel per second). Exposures of 16s were dose- 
fractionated into 20 movie frames. Defocus values in the final data set ranged 
from 0.7 to 3.2 um. 

Image processing. We used MOTIONCORR” for whole-frame motion correc- 
tion, CTFFIND3 (ref. 38) for estimation of the contrast transfer function para- 
meters, and RELION-1.3 (ref. 39) for all other image processing steps. Templates 
for reference-based particle picking were obtained from 2D class averages that 
were calculated from a manually picked subset of the micrographs. Using low- 
pass filtered templates to 20 A to limit reference bias, 1.8 million particles were 
picked automatically from a total of 2,925 micrographs. Because the picking 
procedure is prone to false positives*®, we used reference-free 2D class averaging 
and an initial run of 3D classification to select 412,272 particles for a first 3D 
refinement. After per-particle motion correction and radiation-damage weight- 
ing (particle polishing"’), these particles gave a reconstruction with a resolution of 
3.5A.Ina subsequent 3D classification, we applied a mask around the transmem- 
brane domain and did not perform any alignments. This yielded a subset of 
159,549 particles, for which a reconstruction with improved density in the 
trans-membrane domain was obtained. The resolution of this final map was 
3.4A. Reported resolutions are based on the gold-standard FSC = 0.143 cri- 
terion’, and FSC curves were corrected for the effects of a soft mask on the 
FSC curve using high-resolution noise substitution’’. All 3D classifications and 
refinements were started from a 40 A low-pass filtered initial model, the first of 
which was made from our previous 4.5 A map. Before visualization, all density 
maps were corrected for the modulation transfer function of the detector, and 
then sharpened by applying a negative B-factor that was estimated using auto- 
mated procedures’. Local resolution variations were estimated using ResMap™. 
Atomic modelling. The first atomic model was built from an intermediate, 3.9 A 
EM map which was of sufficient quality to allow side-chain assignment for all four 
components of y-secretase. The initial models used for nicastrin ECD and pre- 
senilin 1 were generated from the crystal structures of DpNCT” (Protein Data 
Bank accession number 4R12) and PSH” (Protein Data Bank accession number 
4HYG) by CHAINSAW*. The two generated structures were first refined in real- 
space by PHENIX” with secondary structure and geometry restraints. APH-1 
and PEN-2 were built de novo from a poly-Ala model. Then the entire atomic 
model was manually improved in COOT’’”. Sequence assignment was guided 
mainly by bulky residues such as Phe, Tyr, Trp and Arg. Unique patterns of 
sequences were exploited for validation of residue assignment. Eleven glycosyla- 
tion sites with obvious sugar densities and five pairs of disulfide bonds in the 
nicastrin ECD domain also facilitated sequence assignment. This initial model 


was largely confirmed and further improved by the 3.4A resolution EM map, 
with minor adjustment to a few residues and identification of two additional 
glycosylation sites and two lipid molecules. This model was refined in real-space 
by PHENIX**. 

An independent effort of atomic model building used stereochemical refine- 

ment tools in Coot*” and REFMAC* that were originally designed for X-ray 
crystallography, but later adapted for cryo-EM*’. The structures of DpNCT” 
(Protein Data Bank accession number 4R12) and PSH’ (Protein Data Bank 
accession number 4HYC) were used as a guide, while APH-1 and PEN-2 were 
built de novo in Coot”, starting from idealized «-helices. This model was refined 
in REFMAC*, with secondary structure restraints generated by ProMMART*. 
Superposition of the two independently built models revealed that both models 
were in excellent agreement. Refinement of the final, consensus model was per- 
formed in REFMAC, and overfitting of the atomic model was monitored by 
refining the model in one of the two independent maps from the gold-standard 
refinement approach, and testing this model against the other map*'. At no point 
in our procedures did we use the atomic model to modify the cryo-EM map. 
y-Secretase activity assay. The assay was performed as described in the 
AlphaLISA Kit (PerkinElmer). Briefly, 2 1] reaction products were incubated with 
8 wl AlphaLISA AB1-40/42 Acceptor beads at 23 °C for 1 h. After another incuba- 
tion for 30 min with 10 yl AlphaLISA AB1-40/42 Donor beads in the dark at 
23 °C, the samples were read using Envision-Alpha Reader (PerkinElmer). The 
readings were expressed in arbitrary units. 
Crosslinking assay. For the crosslinking assay, copper sulfate was added to the 
purified protein at a final concentration of 1 mM, and the sample was incubated at 
4°C for 1h. Then, the protein was applied to gel filtration (Superose-6, GE 
Healthcare). The peak fractions were divided into two aliquots. One aliquot 
was treated with the reducing agent DL-dithiothreitol (DTT), at a final concen- 
tration of 10 mM and incubated at room temperature for 30 min. Samples from 
the two aliquots, treated with or without DTT, were analysed by western blotting 
using monoclonal antibodies against the HA tag on APH-1 or Flag tag on PEN-2. 
The antibodies were purchased from Beijing ComWin Biotech. 
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Extended Data Figure 1 | Cryo-EM, single-particle analysis of human 
y-secretase. a, Representative raw particles from an original micrograph. 

b, Representative reference-free 2D class averages of the y-secretase particles. 
Two classes identified by a red rectangle box (lower right corner) may contain 
some density for the extended cytosolic loop sequences between TM6 and TM7 
of PS1, which are disordered in the final maps. c, Resolution estimation of the 
EM structure. The overall resolution is calculated to be 3.4 A on the basis of 
gold-standard FSC curve”’. d, Colour-coded resolution variations in the 
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y-secretase structure as estimated by ResMap“. e, FSC curves of the final, 
Refmac-refined model versus the map it was refined against (in black); of a 
model refined in the first of the two independent maps used for the gold- 
standard FSC versus that same map (in red); and of a model refined in the first 
of the two independent maps versus the second independent map (in green). 
The small difference between the red and green curves indicates that the 
refinement of the atomic coordinates did not suffer from severe overfitting. 
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Extended Data Figure 2 | An atomic model of human y-secretase. a, The 
y-secretase structure is viewed parallel to the lipid membrane. Shown here is 
EM density for the entire y-secretase complex. EM density is coloured blue for 
PS1, yellow for PEN-2, magenta for APH-1, and green for nicastrin. b, The 
density map for TM2 of PS1. Among the 20 TMs, TM2 of PS1 shows the highest 
degree of flexibility and only becomes visible at as rod-shaped density in a 7 A 
low-pass filtered map. At this resolution, another rod-shaped density is 
visible next to TM2 and remains unaccounted for. c, EM density map and the 


Met259 


atomic model are shown for all seven TMs of APH-1. Two to three bulky 
residues are indicated for each TM. d, EM density map and the atomic model 
are shown for seven TMs of human PS1. TM6 exhibits relatively poor EM 
density, probably because of its intrinsic flexibility. e, EM density map and the 
atomic model are shown for the three TMs of PEN-2. f, EM density map and 
the atomic model are shown for the only TM and select regions of nicastrin. 
g, EM density map and the atomic model for three representative glycans. 

h, EM density map and putative assignment for two lipid molecules. 
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Extended Data Figure 3 | Overall structure of human y-secretase. the same as in Fig. 1. Two lipid molecules are shown. Eleven glycosylated Asn 
a, Structure of human y-secretase is shown in cartoon representation (top) and _ residues and their glycans are displayed in stick. b, The y-secretase structure 
surface view (bottom) in four successively perpendicular views. The y-secretase _is represented by electrostatic surface potential. 

structure is viewed parallel to the lipid membrane. The colouring scheme is 


©2015 Macmillan Publishers Limited. All rights reserved 


ARTICLE 


Electrostatic surface potential 
ES } I” 
Negative Positive 


Extended Data Figure 4 | Electrostatic surface potential of PS1. PS1 exhibits a loosely folded structure, with several large cavities and empty spaces between 
adjacent TMs. 
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squared deviation of 2.2 A. Two perpendicular views for each structure are 


respectively. The overlay is shown in the right panel, with a root mean 
displayed here. 


are shown in the left and middle panels, 


Extended Data Figure 5 | Structural comparison between human nicastrin 
and nicastrin from D. purpureum (DpNCT). Individual structures of 


human nicastrin and DpNCT” 
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Extended Data Figure 6 | PEN-2 contains three small hydrophobic cores and two in the transmembrane region. These three regions are boxed and 
in its three TMs. Unlike previous prediction’*”’, PEN-2 contains three, not shown in close-up views. 
two, TMs. PEN-2 contains a small hydrophobic core in the extracellular side 
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Extended Data Figure 7 | Results of crosslinking experiments corroborate 
the atomic model of y-secretase. a, Crosslinking results for the interface 
between APH-1 and nicastrin (NCT). Three mutant y-secretase complexes 
were examined: APH-1-V147C/NCT-I40C, APH-1-V146C/NCT-A664C, and 
APH-1-A4C/NCT-L673C. Shown in the upper panel is an SDS-PAGE gel 
blotted by a monoclonal antibody against the HA tag on APH-1. Only in the 
absence of DTT did crosslinking lead to high molecular mass complexes for the 
mutant y-secretase, but not for the WT y-secretase. The two bands probably 
represent APH-1 crosslinked to mature nicastrin (mNCT) and immature 
nicastrin (iNCT). The structural basis is shown in the lower panel. The 


distances between the Ca atoms of the two residues targeted for cysteine 
mutation range from 4.1 to 6.1 A, which would facilitate convenient 
crosslinking reactions. b, Crosslinking results for the interface between PS1 and 
PEN-2. The mutant y-secretase contains PEN-2-P97C and PS1-N190C. Shown 
in the upper panel is an SDS-PAGE gel blotted by a monoclonal antibody 
against the Flag tag on PEN-2. c, Crosslinking results for the interface between 
PEN-2 and nicastrin. Two y-secretase mutants were examined: PEN-2-T100C/ 
NCT-V224C and PEN-2-L98C/NCT-H222C. d, Crosslinking results for the 
interface between APH-1 and PS1. Two y-secretase mutants were examined: 
APH-1-T204C/PS1-F465C and APH-1-A76C/PS1-1467C. 
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Extended Data Figure 8 | Implication on substrate access to y-secretase. with key features labelled; right, suggested putative path for substrate access to 
Structure of y-secretase is displayed in three relevant views: left, electrostatic the active site of y-secretase. 
surface potential from the convex side of y-secretase; middle, overall structure, 
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Dense cloud cores revealed by CO in the low 
metallicity dwarf galaxy WLM 


Monica Rubio’, Bruce G. Elmegreen’, Deidre A. Hunter’, Elias Brinks*, Juan R. Cortés® & Phil Cigan’ 


Understanding stellar birth requires observations of the clouds in 
which they form. These clouds are dense and self-gravitating, 
and in all existing observations they are molecular, with H, the 
dominant species and carbon monoxide (CO) the best available 
tracer’”. When the abundances of carbon and oxygen are low com- 
pared with that of hydrogen, and the opacity from dust is also low, 
as in primeval galaxies and local dwarf irregular galaxies’, CO 
forms slowly and is easily destroyed, so it is difficult for it to 
accumulate inside dense clouds*. Here we report interferometric 
observations of CO clouds in the local group dwarf irregular galaxy 
Wolf-Lundmark-Melotte (WLM)*, which has a metallicity that is 
13 per cent of the solar value®*’ and 50 per cent lower than the 
previous CO detection threshold. The clouds are tiny compared 
to the surrounding atomic and H; envelopes, but they have typical 
densities and column densities for CO clouds in the Milky Way. 
The normal CO density explains why star clusters forming in dwarf 
irregulars have similar densities to star clusters in giant spiral 
galaxies. The low cloud masses suggest that these clusters will also 
be low mass, unless some galaxy-scale compression occurs, such as 
an impact from a cosmic cloud or other galaxy. If the massive 
metal-poor globular clusters in the halo of the Milky Way formed 
in dwarf galaxies, as is commonly believed, then they were probably 
triggered by such an impact. 

WLM is an isolated dwarf galaxy at a distance of 985 + 33 kilo- 
parsecs (kpc) (ref. 5). Like other dwarfs, the relative abundance of 
supernova-processed elements (‘metals’) such as carbon and oxygen 
is low®, 12 + log(O/H) = 7.8, compared with 8.66 for the Milky Way’. 
Low carbon and oxygen abundances, along with the correspondingly 
low abundances of other processed elements and dust, make the CO 
molecule rare compared to Hy, and this calls into question the standard 
model of star formation in CO-rich clouds’. In fact, the star formation 
rate®* compared with the existing stellar mass is actually high in WLM: 
0.006 Mo yr! of new stars for a total stellar mass? of 1.6 X 10’ Mo 
is 12 times higher than in the Milky Way, where the star formation 
rate’ is ~(1.9+0.4) Mo yr | and the stellar mass is (6.4 + 0.6) X 
10°°Mq (ref. 11). Thus, WLM forms stars efficiently even with a 
relatively low abundance of CO. 

To understand star formation in metal-poor galaxies, which include 
the most numerous galaxies in the local universe (the dwarfs, plus all 
primeval galaxies), we previously searched for CO(3-2) in WLM using 
the APEX telescope”, discovering it in two unresolved regions at an 
abundance relative to H; that was half that in the next-lowest metalli- 
city galaxy, the Small Magellanic Cloud. Now, with the completion of 
the new millimetre- and submillimetre-wavelength interferometer, the 
Atacama Large Millimeter Array (ALMA), we have imaged these two 
regions in CO(1-0) and resolved the actual molecular structure. 

The ALMA maps with 6.2 X 4.3 pc spatial resolution (HPBW), 
5 mJy sensitivity and 0.5kms_' velocity resolution (FWHM) contain 
10 CO clouds with an average radius of 2 pc and an average virial mass 


of 2X 10°Mo. Figure 1 shows the CO emission with black contours 
superposed on H1 in green and Hz in red. The inset shows a colour 
composite of the optical image in green (V band), the far ultraviolet 
(FUV) Galaxy Evolution Explorer (GALEX) image in blue and the H1 
in red. A [C1]2158 um image from the Herschel Space Observatory’* 
is superposed on the southeast region in blue’. The [Cu] is from a 
photodissociation region including ionized carbon; it is five times 
larger in size than the CO core, indicating a gradual transition between 
low-density atomic gas to high-density molecular gas. 

Figure 2 shows the contours and spectra of each cloud. The spectral 
signal-to-noise averages 10 when smoothed to the typical linewidth 
of 2kms '. Velocities for H1 emission are indicated by a bar 
below each CO spectrum. The cloud properties are summarized in 
Table 1. The radii R range from 1.5 pc to 6 pc, obtained using the 
equation R = (A/n)°° for area A, with A determined after deconvolu- 
tion by quadratic difference with the beam area. The sum of all the line 
emissions measured by ALMA is within a factor of two of the total 
emission found at 18” resolution by the APEX telescope. The line- 
widths were corrected for instrumental spectral broadening. 

Virial masses for the CO clouds were calculated from the 
relation Myi(Mo) = 1,044Ro* for R in pe and Gaussian linewidths 
ainkms '. The CO luminosity in K km s_' pe? was calculated from 
Loo = 2,453Sc9AV D? for integrated emission S in Jy kms ~ 1 FWHM 
of the line AVin km s |, and distance D in Mpc. Figure 3 shows the 
relationships between these values including other dwarf galaxies (all 
for CO(1-0)). The CO clouds in WLM satisfy the usual correlations, 
although they are the smallest seen for any of these galaxies. Higher- 
resolution observations should reveal small clouds and/or cores in 
other galaxies too, but the main point is that WLM has no CO clouds 
as large as those seen elsewhere. 

The virial mass gives some perspective on the conversion from 
CO luminosity to mass derived previously’, which was aco ~ (124 + 
60) Mo pe *(Kkm s ') ‘ for the northwest region. This value for « 
was derived from the dust-derived H, column density. If instead 
we take the virial masses and CO luminosities in Table 1, we find 
that the mean ratio is ~~ (28 + 28) Mo pe *(Kkm s ') 1. If the 
clouds are not gravitationally bound, then o,;, would be smaller. 
The difference between these two « values arises because most of the 
H, volume has no CO emission, which apparently exists only in 
the densest cores of the H, clouds. For the Milky Way, CO and 
H, have about the same extent in star-forming clouds, making 
aco ~4Mo pe? (Kkms~') ~! When CO does not fill an H, cloud, 
a can be small for each CO core but large for the total H, cloud. If 
the purpose of « is to determine the total H, mass in a region based on 
Leo; then the large value should be used. 

The self-gravitational boundedness of the CO clouds can be esti- 
mated from the general requirement of an associated H, density of 
~10°cm ? for collisional excitation’. In fact, the virial density of 
the CO clouds is comparable to this, n(H,) = 4.1 x 10 7? gem > 


1Departamento de Astronomia, Universidad de Chile, Casilla 36-D, 8320000 Santiago, Chile. 21BM Research Division, T.J. Watson Research Center, 1101 Kitchawan Road, Yorktown Heights, New York 
10598, USA. Lowell Observatory, 1400 West Mars Hill Road, Flagstaff, Arizona 86001, USA. 4Centre for Astrophysics Research, University of Hertfordshire, Hatfield AL10 9AB, UK. 5 Joint ALMA Observatory, 
Alonso de Cérdova 3107, Vitacura, 7630355 Santiago, Chile. National Radio Astronomy Observatory, Avenida Nueva Costanera 4091, Vitacura, 7630197 Santiago, Chile. 7New Mexico Institute of 


Mining and Technology, Socorro, New Mexico 87801, USA. 
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Figure 1 | Tiny CO clouds in WLM. A colour composite of the various 

gas phases in WLM: green is H1 (ref. 26), red is Hx (ref. 27) and blue is 
[CuJ/158 jm (ref. 14). The CO emission is shown as black single contours 
inside the 1 arcmin X 1 arcmin white squares that outline the area mapped in 


(~10° cm °), from the ratio of the virial mass (~2 X 10° M.) to the 
cloud volume (4nR°/3 for R ~ 2 pc). Thus the clouds could be mar- 
ginally bound. 

Another measure of CO density is from pressure equilibrium 
between the CO regions and the weight of the overlying H1 and H, 
layers. The H2 mass column density, Xj, comes from the difference 


Table 1 | Properties of WLM CO clouds 


Region RA (h min s) Dec. (°'") Peak intensity (mJy) Visr (km s~?) 

NW-1 000157.162 —15 2700.00 12.2 —131.79 40.19 
NW-2 000157.291 —15 2652.80 16.1. —136.42+0.18 
NW-3 000157.901 —15 26 58.00 10.8 =126.27 0.15 
NW-4 000158.079 —15 2700.12 12.2 —125.38 + 0.16 
SE-1 000201.485 -—15 2742.65 10.8 —121.85 +0.18 
SE-2. 000201.761 -—15 2755.83 13.3 —118.18+0.16 
SE-3. 000201.801 -—15 2751.78 14.3 —120.00 + 0.12 
SE-4 000201.864 —15 2800.52 8.77 =118.01 +0.17 
SE-5 000202.101 —15 2758.23 6.92 —117.21+0.48 
SE-6 000202.222 -—15 2752.08 13.7 —117.79 £0.12 


Dec., declination; NW, northwest; RA, right ascension; SE, southeast. 


CO (1-0) by ALMA. The synthesized ALMA beam (0.9” X 1.3”) is shown in 
the bottom left corner of each square. The inset in the top left is the full view of 
WLM obtained by combining H1 and optical data: red is H1, green is V band, 
and blue is GALEX FUV”’. 


between the total gas column density derived from the dust emission 
and the H1 column density observed at 21cm. For the northwest 
region”, Sy = (31 £15) Mo pe ”. Adding the H1 column density”? 
gives Lota = (58 + 15) Mo pe °. The corresponding pressure from 
self-gravity is (1/2)G2 ‘ita, ~ 1.6 x 1071! dynes. Considering the typ- 
ical CO velocity dispersion for our clouds of ¢ ~ 0.9kms_', the ratio 


Integrated flux density Radius (pc) a(kms_+) Myir (Ma) Loo (K km s~? pe?) 
(Wy km s~}) 

0.037+0.004 2.21+1.11 1.05+0.17 2,548+1,522 8147+839 
0.025+0.003 149+0.77 084+0.28 1,087+919 56.23 + 5.69 
0.048+0.005 2.69+1.35 0.75+014 1,561+985 106.26 + 10.71 
0.025+0.003 2.69+1.35 057+0.14 8984+ 637 54.90 + 5.84 
0.051+0.005 1.68+0.87 0.77+0.18 1,037+720 113.57 + 11.46 
0.021 + 0.002 <1 0.61+0.23 <390+300 46.93 + 5.02 
0.030+0.003 2.21+1.15 069+0.09 1,113+653 67.30 + 6.96 
0.258+0.026 6.01+1.20 1.32+0.14 10,881+3,209 571.17+57.20 
0.030+0.003 2.02+0.96 1.81+0.57 6896+5,426 67.30+7.16 
0.031+0.003 3.37+1.06 0.63+0.15 1,383+805 68.85+7.11 
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Figure 2 | CO clouds and spectra. CO contour maps of the integrated 
emission starting at the 2-s.d. level (right ascension and declination in J2000.0 
coordinates). Different CO clouds are identified by colour. The ALMA beam is 
the black ellipse in the lower left corner. The CO spectrum corresponding to 


of the core pressure to the square of the CO velocity dispersion is the 
equilibrium core density, 1.910 *'gcm °, corresponding to 
500H,cm *. Thus the virial density, excitation density and pressure 
equilibrium density are all about 10° cm” *. 

A condition for molecules in the Milky Way is a threshold extinc- 
tion of Ay = 0.3 mag for Hz and ~1.5 mag for CO"*. These correspond 
to mass column densities of 6.1 Mo pe 7 and 30.3Mo pe ” in the 
solar neighbourhood. In WLM, where the metallicity is 13% solar, 
the mass thresholds are 47 Ma pe * and 230 Ma pe * for the same 
extinctions, respectively. The first is satisfied by the H1 + Hz envelope 


Velocity (km s7) 


each detection is plotted. The velocity for H1 emission (FWHM) is shown as a 
rectangular box on the abscissa (local standard of rest, LSR); the CO velocities 
agree with the H1. 


of the CO cores (~58 Ma pe ”), and the second is satisfied by the total 
column density of 220 Mo pe’ calculated from the H1 and H, envel- 
ope, plus the H, from the embedded CO core itself (as determined 
from the CO virial mass, 2 X 107M. @, and ALMA measured radius, 
2 pc). These results suggest that the CO clouds in WLM are normal in 
terms of density, pressure and column density, which explains why 
they lie on the standard correlations in Fig. 3. They also appear to be 
marginally self-bound by gravity, suggesting that they are related to 
star formation. Their properties are typical for parsec-size molecular 
cloud cores in the solar neighbourhood”. 


a = b 10° 


10+ 


105, 


sie 108b 
io} 


vir M, 


a (kms) 


= 104 


10? |. 


10? L L 


Figure 3 | Correlations for CO clouds in dwarf 
galaxies. a, b, The symbols refer to different galaxies 
(SMC, dwarfs, M31 and M33 (ref. 28); LMC (ref. 
29)). Error bars are 1 s.d. a, CO linewidth o versus 
radius R; the solid line is a fit to WLM, the SMC 

4 and dwarf galaxies: o = (0.48 + 0.08) x R053 = 0.05, 
and the dashed line includes the LMC also: 

o = (0.40 + 0.03) X R°*?*°. The black short- 
dashed line and the grey area indicate the 

standard relation for the Milky Way”: 

4 o = (0.72 + 0.07) X R°*° =. R for WLM is 
measured in the same way as for the Milky Way 
and other galaxies. b, Virial mass versus CO 
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Our observation explains why star clusters have about the same 
central densities in dwarf irregular'® and spiral galaxies’’, even though 
the ambient gas density in dwarfs is much less than it is in spirals”. If 
the unifying process of star formation is the formation of CO and other 
asymmetric molecules for cooling (however, see refs 16, 21), then the 
similarity between the CO cores in the two cases accounts for the 
uniformity of clusters. The small mass of the CO cores in WLM also 
explains why most dwarf galaxies do not form high-mass clusters’®. 
The CO parts of interstellar clouds are smaller at lower metallicities, so 
the clusters that result are smaller too. For example, there are no 
massive young clusters in these regions of WLM”. This lack of massive 
clusters is usually attributed to sparse sampling of the cluster mass 
distribution function at low star formation rates'’, but the present 
observations suggest it could result from some physical reason too, 
such as the lack of massive CO clouds at low metallicity. 

When the local dwarf galaxies NGC 1569 and NGC 5253 formed 
massive clusters, there was a major impact event to increase the pres- 
sure and mass at high density”*”*. Such an impact would also seem to 
be needed for the formation of old halo globular clusters, which are 
massive and have low metallicity like their former dwarf galaxy 
hosts**”*. 


Online Content Methods, along with any additional Extended Data display items 
and Source Data, are available in the online version of the paper; references unique 
to these sections appear only in the online paper. 
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METHODS 

ALMA observations. We observed the *CO(J = 1 — 0) transition in two regions 
in WLM using ALMA, located on the Chajnantor Plateau in northern Chile, 
during Cycle 1. Observations were carried out on 8 July 2013 and 3 April 2014. 
The ALMA receivers were tuned to the ground rotational transition of carbon 
monoxide, CO(1-0). The interferometer configuration C32-2/C32-3 provides a 
maximum baseline of 0.442 km. The observations were done with a spectral reso- 
lution of 122 kHz per channel (0.32 kms~ ') and total bandwidth of 468.750 MHz 
per baseband. The source J2258-2758 was used as a bandpass calibrator and 
J2357-1125 was used to calibrate amplitude and phases with time. To set the 
absolute flux scale, Uranus was observed. We estimated an uncertainty in absolute 
calibration of 10%. 

The data were calibrated, mapped, and cleaned using the ALMA reduction 

software CASA (version 4.2.1). Rather than use the pipeline-delivered science data 
cubes, we redid the cleaning (that is, Fourier transform and beam deconvolution) 
using a better definition for masking of regions containing emission, and natural 
weighting to optimize sensitivity. The maximum angular scale for recovered emis- 
sion was estimated to be 15”. 
Identifying sources. To make a first cut at identifying sources, we convolved the 
image cube to a 1.25” X 1.25” beam and examined a wide velocity range around 
the velocity expected from the APEX detection. For the southeast region we 
expected signal around Visr = —120.5kms! and examined —130.5kms | to 
—110.5kms '. We detected candidate sources at —123kms * to —115.5kms’’. 
For the northwest region we expected signal around —130.5kms_' and examined 
—140.5kms! to —120.5kms 4, detecting potential sources at —139km s | to 
—121.5kms7'. In each velocity channel we looked for knots that had more counts 
than the majority of knots that were noise. Then we looked for signal in nearly the 
same location in successive channels, expecting coherence over at least three channels 
due to the Hanning smoothing that had been applied. We also generally expected the 
signal to build up and fade away as the channels sampled the source spectrum. With 
these criteria, we rated the confidence level of each candidate source as ‘confident’, 
‘certain’, ‘not so certain’, or ‘uncertain’. For the southeast region, we identified nine 
candidate sources, six ranked as ‘confident’ or ‘certain’. In the northwest region, we 
identified 20 potential sources, four ranked as ‘certain’ and the rest as less certain. 


On the basis of this identification, we integrated the emission in the velocity 
range at which CO was seen, and produced the two velocity integrated maps 
shown in Fig. 2 using our reduced new higher sensitivity and velocity resolution 
cubes. The velocity resolution of these cubes is 0.5kms~' per channel. All velo- 
cities are in the local standard of rest (LSR) system. For WLM-SE, five integrated 
maps were made covering a total LSR velocity range Vjsp= —121kms'! 
to —115.5kms_'; the maps spanned velocities of —121.0kms ' to —115.5 
kms~', —121.5kms~! to —119.0kms~', —119.0kms"! to —117.5kms_', 
—118.5kms ' to —117.0kms~', and —124.0kms | to —120.5kms '. For 
WLM-NW, four integrated maps were made covering Visp = —136.5km s | 
to —124kms_}; the individual ranges were —137.0km s ! to —135.5kms 1, 
—133.5kms ' to —130.0kms', —127.5kms' to —125.5kms ', and 
—127.0kms~' to —125.5kms_’. For those sources that showed emission at a 
3-s.d. level or above, a spectrum was obtained integrating over an area delineated 
by a contour drawn at 2 s.d. (see Fig. 2) in order not to miss any genuine emission. 
Wealso produced velocity-RA and velocity-dec. maps. Inspecting the CO spectra 
and the velocity-position maps, we confirmed 10 CO clouds of the original 20 
candidates. The remaining 10 were deemed of too low signal-to-noise to be 
included in this study. On each CO spectrum plot we included the H1 emission 
FWHM velocity width and converted the H1 heliocentric to LSR velocity using 
Vise (that is, Vise = Vutelio— 2.5 km s-): 

The total flux of the 10 clouds resolved with ALMA was compared to the 
CO(3-2) flux in our previous APEX observations. We converted the CO(3-2) 
APEX fluxes from Kkms_! to Jy and assumed a thermal CO(1-0)/CO(3-2) line 
ratio of one. For WLM-SE we recovered a similar flux of 0.42 Jy in both cases. For 
WLM-NW we measured an ALMA flux of 0.14 Jy while the APEX flux converted 
to CO(1-0) is 0.66 Jy. The difference in the northwest region can be due to a 
different line ratio and thus different physical conditions, or it could be from 
weaker emission not included in our criteria for defining CO clouds, or it could 
be from emission that is larger in angular extent than the largest structures mea- 
sured by the interferometer and therefore absent from our maps. If we take both 
regions, then the measured flux with ALMA is a factor of 2 within the measured 
flux with APEX. 
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Quadrature squeezed photons from a 


two-level system 


Carsten H. H. Schulte!*, Jack Hansom!, Alex E. Jones', Clemens Matthiesen', Claire Le Gall! & Mete Atatiire! 


Resonance fluorescence arises from the interaction of an optical 
field with a two-level system, and has played a fundamental role in 
the development of quantum optics and its applications. Despite 
its conceptual simplicity, it entails a wide range of intriguing 
phenomena, such as the Mollow-triplet emission spectrum’, 
photon antibunching’ and coherent photon emission’. One fun- 
damental aspect of resonance fluorescence—squeezing in the form 
of reduced quantum fluctuations in the single photon stream from 
an atom in free space—was predicted more than 30 years ago*. 
However, the requirement to operate in the weak excitation 
regime, together with the combination of modest oscillator 
strength of atoms and low collection efficiencies, has continued 
to necessitate stringent experimental conditions for the obser- 
vation of squeezing with atoms. Attempts to circumvent these 
issues had to sacrifice antibunching, owing to either stimulated 
forward scattering from atomic ensembles”* or multi-photon tran- 
sitions inside optical cavities”*. Here, we use an artificial atom with 
a large optical dipole enabling 100-fold improvement of the 
photon detection rate over the natural atom counterpart’ and 
reach the necessary conditions for the observation of quadrature 
squeezing in single resonance-fluorescence photons. By imple- 
menting phase-dependent homodyne intensity-correlation detec- 
tion®"’, we demonstrate that the electric field quadrature variance 
of resonance fluorescence is three per cent below the fundamental 
limit set by vacuum fluctuations, while the photon statistics remain 
antibunched. The presence of squeezing and antibunching simul- 
taneously is a fully non-classical outcome of the wave-particle 
duality of photons. 

The minimum fluctuations in any quantum measurement of canon- 
ically conjugate variables such as position and momentum are bound 
by the Heisenberg uncertainty principle. Although this principle can- 
not be violated, the fluctuations of a single variable can be reduced 
below this minimum value at the expense of enhancing the fluctua- 
tions of the conjugate variable. The most widely explored realization of 
this non-classical phenomenon is squeezed light’’, where the quadrat- 
ure operators X, and X; of the electric field are the canonically con- 
jugate operators. Relying inherently on the quadratic dependence on 
the bosonic creation and annihilation operators in the Hamiltonian, 
squeezed light is typically generated using intense lasers and mac- 
roscopic nonlinear optical media’*. This form of squeezed light has 
multiple applications in the field of quantum optics'*, one prominent 
example being interferometry with reduced quantum noise’*”®. 

In 1981, it was predicted* that the quadratic form of the 
Hamiltonian is not a requirement and that quadrature squeezing 
can also be obtained via a radically different approach: the interaction 
of a two-level system with a resonant light field, as described by the 
Jaynes-Cummings Hamiltonian. The fluctuations in one quadrature, 
quantified by their variance, can be reduced up to a theoretical max- 
imum of 12.5% lower than vacuum fluctuations, while the intensity 
statistics remain antibunched. Unlike its nonlinear optics counterpart, 
this unique form of squeezed light stems from a build-up of atomic 


coherence which, once mapped onto the emitted field, results in the 
creation of coherences between the n = 0 and n = 1 Fock states in the 
weak excitation regime. Higher number states are excluded by photon 
antibunching or, equivalently, by the fermionic nature of atomic 
operators. The simultaneous presence of antibunching and squeezing 
is an intriguing yet counter-intuitive effect, because single photons do 
not have a well-defined phase. It is the coherence with the zero-photon 
(vacuum) component that allows for phase-dependent effects such as 
the squeezing discussed here. 

The two-level system we use is a voltage-controlled semiconductor 
quantum dot’ positioned under a solid immersion lens for enhanced 
photon collection efficiency. Typically allowing photon detection rates 
well above a million photons per second, these artificial atoms obviate 
the immediate need for cavity coupling and consequently allow for the 
experimental realization of an isolated, weakly excited two-level system 
treated in ref. 4. With large oscillator strength, high internal quantum 
efficiency and negligible decoherence, semiconductor quantum dots 
enabled recent observations of key phenomena in quantum optics, 
such as antibunching’*"”’, formation of dressed states*”*’, generation 
of entangled photon pairs’*” and, particularly relevant for this work, 
coherent single photon generation via weak excitation’®*””. The strong 
transition dipole moment of the quantum dot compared to single atoms 
is also the key advantage in the detection of squeezing in resonance 
fluorescence. Although the maximum degree of squeezing is in principle 
independent of the oscillator strength of the emitter, the resultant 
increase in the photon scattering rate yields the necessary signal-to- 
noise ratio for the experimental observation of this effect (see Methods). 

To generate resonance fluorescence, we excite the nm -polarized neut- 
ral exciton transition of a single quantum dot resonantly at 970 nm using 
a frequency-stabilized tunable laser (Fig. 1a). Resonance fluorescence 
and the reflected laser are separated by a polarizing beam splitter and, 
after imparting a relative phase through a path length difference Af, 
recombined via a non-polarizing beam splitter. One of the outputs of 
this beam splitter contains the superimposed light (SL) field 

EO = Be (+ Eo’ (1) 
where t is time, the (+) superscripts denote the positive frequency part 
of the operator, the subscript LO (RF) indicates local oscillator (res- 
onance fluorescence) and the relative amplitude and phase ¢ of the local 
oscillator and resonance fluorescence fields can be tuned independently 
in the experimental scheme illustrated in Fig. 1a. Reflection and trans- 
mission coefficients and all other relative phases due to the optical setup 
are included in the field amplitudes and the phase @. Figure 1b shows the 
intensity measured by a single detector as a function of the interfero- 
meter-induced phase kA, where k is the wavenumber of the fields. The 
phase due to the dipolar response of the transition, which is determined 
by the relative detuning between the excitation laser and the transition 
frequency, is also included in @; Fig. 1c displays the measurement of the 
detuning dependence of this additional phase. 

The amplitude (E( )) of a light field, where ¢ is the relative 
phase with respect to a coherent reference field, can be represented 
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Single detector 
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Figure 1 | Homodyne intensity-correlation detection. a, Schematic 
illustration of the homodyne intensity-correlation setup. QD, quantum dot; 
LO, local oscillator; SL, superimposed light; LP, linear polarizer; PBS, polarizing 
beam splitter; ND, neutral density filter; 7/2, half-wave plate; HBT, Hanbury 
Brown and Twiss correlation setup. b, Intensity of the superimposed light 
field on a single detector as a function of the interferometer phase at an 
excitation power of s = 0.1. At this excitation power, each detector of the 
Hanbury Brown and Twiss correlation setup records 1.6 X 10° events per 
second from resonance fluorescence contribution alone. c, Dipole phase offset 
in interference pattern as a function of detuning between laser and quantum dot 


in the phase space of the conjugate variables X; and X) via 
E() oc X(¢) =(Xicos +X) sin). These quadratures are the analo- 
gues of the dimensionless position and momentum operators and their 
variances, quantifying the quantum fluctuations of the electric field, and 
are subject to a similar uncertainty relation: ((AX;)*)((AX)”) > 1/16. 
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frequency. d, e, Intensity autocorrelation measurement with the local oscillator 
path blocked (d) and unblocked (e). In the blocked case (d), the GO(t) 
measurement gives the expected antibunching, which is evidence of a single 
two-level system, regardless of ¢. The ordinate in d and e shows the coinci- 
dences in units of count rates for comparison. For the unblocked local oscillator 
(e), the interference between local oscillator and quantum dot fields leads to 
phase-dependent correlations, some of which contain the quadrature variance 
of the quantum dot field. For b-e, the experimental data are shown as filled 
circles and the theoretical simulations using a two-level master equation for the 
corresponding experimental conditions are shown as solid lines. 


To demonstrate a squeezed quadrature (AX()’ <1 /4), we imple- 
ment the experimental setup proposed in ref. 11, which provides 
a direct and convenient link between time-correlated two-photon 
detection and the variances of field quadratures. To detect the var- 
iances of the resonance fluorescence field quadratures X 1,2» We perform 
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Figure 2 | Phase-dependent quadrature variance of resonance fluorescence. 
a, In-phase (orange) and in-quadrature (blue) normally ordered autocorrelations 
of the electric field quadrature fluctuations (:(AX(p,0)AX(,7)) :) for high 
(top panel) and low (bottom panel) power excitation conditions. Negative values 
in the bottom panel verify squeezing of the in-phase electric field variance. b, Full 
phase dependence of the quadrature variances (zero time delay of the 


autocorrelations), for different excitation powers. A measurement of coherent 
laser quadratures provides a reference for the vacuum limit of zero (green circles, 
right-most panel). Solid curves in all panels are theoretical simulations using 

a two-level master equation for the corresponding experimental conditions. 
Error bars represent a statistical uncertainty of one standard deviation of the 
correlations at long delays (t > 5 ns). 
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an intensity autocorrelation on the superimposed light field Es, using a 
Hanbury Brown and Twiss (HBT) correlation setup. The unnorma- 
lized second-order correlation function of the superimposed light field 


GEibt +2) = (BORO E+ DEP DEO) — (2) 


(where t is the time delay and the (—) superscripts denote the negative 
frequency part of the operator) produces the well-known antibunched 
second-order correlation function of the resonance fluorescence 


field G2 in the absence of a local oscillator, which demonstrates the 
single photon nature of resonance fluorescence’. The solid red 
circles in Fig. 1d display this behaviour for an excitation power of 
s = P/P.a = 0.1, where the saturation power P,,; yields half of the 
maximum attainable resonance fluorescence intensity. The black 
curve is the theoretical prediction obtained with a two-level master 
equation and includes the detector response function, as well as sub- 
linewidth spectral wandering of the quantum dot transition fre- 
quency*”’. In the presence of the local oscillator field, Ge, displays 
a strong dependence on phase @, as shown in Fig. le for ¢ = 0, 7/2, 7. 
While the coincidence rate at long time delays changes with ¢ by more 
than an order of magnitude, the correlation behaviour at short time 
delays evolves from a dip to a peak as a function of ¢. 

The total correlation function G2. contains five terms with |E;o|” 
for n = 0, 1, 2, 3, 4. We separate their contributions via their unique 
dependence on the time delay (t) and relative phase (#), as well as via 
direct measurement of the zeroth-order contribution (Fig. 1d); see 
Supplementary Information. The second-order contribution is directly 
proportional to the normally ordered autocorrelation of AX(¢,t): 


AG?(e) cc (:AX(,0AK(6.2):) a 


where (: ... :) denotes the normal ordering of the field operators. The 
zero-delay value of equation (3) hence yields the normally ordered 


variance (:(AX($)):) of the electric field quadrature’’. This variance 
is zero for vacuum and coherent fields and the existence of quadrature 
squeezing is manifested in a negative-valued normally ordered vari- 
ance: (:(AX(#))*:) <0. 

Figure 2a presents the autocorrelation of the in-phase (#¢ = 0) and 
in-quadrature (¢ = 11/2) fluctuations for high excitation power (s = 3, 
upper panel of Fig. 2a). The normally ordered variance (t = 0) of 
resonance fluorescence in the high-power regime is positive-valued 
regardless of the phase. This indicates that the quantum fluctuations 
are enhanced above the vacuum level, as expected. In stark contrast, 
the low-power regime (s = 0.1, lower panel of Fig. 2a) yields negative 
values for the normally ordered variance for ¢ = 0. This reduction of 
quantum fluctuations below the vacuum limit is the verification of 
quadrature squeezing in this measurement. As dictated by the 
Heisenberg uncertainty relations, this squeezing is accompanied by 
increased fluctuations, that is, antisqueezing, in the other quadrature. 
Both features decay on a timescale of the order of the excited state 
lifetime”®. 

Figure 2b shows how the normally ordered variance evolves as a 
function of ¢, for excitation powers ranging from s = 0.05 (left-most 
panel) to s = 3 (right-most panel). A measurement with a weak laser of 
similar intensity is displayed in the rightmost plot as green circles. As 
expected for a coherent state, this measurement yields a normally 
ordered variance of zero, independent of #. The squeezing vanishes* 
for s = 1, yielding larger fluctuations than vacuum for any ¢. Although 
we observe the phase dependence of the quadrature variance at all 
excitation powers, the window of opportunity for measuring negative 
values is restricted to a very small ¢ range in the low-excitation 
regime (s < 1), which highlights the challenges that have been assoc- 
iated with the observation of squeezing in resonance fluorescence since 
its prediction. 

Figure 3a summarizes the power dependence of the normally 
ordered quadrature variance extrema. The solid blue (red) curve 
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represents the theoretically predicted behaviour of the in-phase 
(in-quadrature) field variance for an ideal two-level system. The max- 
imum possible squeezing is limited to 12.5% (0.58 dB) below vacuum 
fluctuations at s = 0.36. The dashed curves depict how the ideal two- 
level system behaviour is modified, owing to the combined effects of 
the finite timing resolution of our detection system and the phase 
uncertainty of our interferometer (see Methods). All variance extrema 
we measured are commensurate with these predictions, which con- 
firms that deviation from the solid curve is predominantly due to 
limitations of a technical nature. The maximum degree of squeezing 
we measure is 3.1% + 1% (0.14 dB) below vacuum noise at an excitation 
power of s = 0.1. This value corresponds to 40% of the theoretically 
obtainable limit set by the blue solid curve at this excitation power. 
The transformation of the state of light with excitation power is best 
visualized by the calculated Wigner functions presented in Fig. 3b. The 


Figure 3 | Excitation-power dependence of quadrature squeezing. a, The 
measured (symbols) and the theoretical (solid curves) normally ordered 
quadrature variances as a function of the excitation power s for # = 0 and 

# = 1/2. The dashed curves illustrate the effect of phase noise and finite 
timing resolution in our experiment on an otherwise ideal two-level system 
(see Supplementary Information). Horizontal error bars, uncertainty in the 
excitation power; vertical error bars, s.d. of the correlations, as in Fig. 2. 

b, Wigner functions for different excitation powers. The transition from 
vacuum state (s = 0) to a mixture of vacuum and single photon Fock 

state (s—> ~) displays non-symmetric forms at intermediate excitation 
regimes (s = 0.36 and s = 10). For s = 0.36, the spread of the Wigner function 
along the quadrature variable X; is clearly less than that of vacuum, a 
manifestation of quadrature squeezing. The dashed lines depict the contour at 
50% of the maximum value at each power. 
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spread of these phase-space distributions along a given polar angle ¢ is 
indicative of the variance of the corresponding field quadrature 
X(#)=Xi cos () +X) sin(#). The Wigner function for the vacuum 
state (top left) shows a symmetric form with no phase dependence. 
At intermediate powers (top right and bottom left), the symmetry 
breaks down and a ¢-dependence arises in the spread of the Wigner 
function, linked to the generation of atomic coherence (see 
Supplementary Information). This phase dependence, in combination 
with the antibunched nature of resonance fluorescence, leads to a 
reduced variance of the electric field for a phase angle of ¢ = 0 (as seen 
in, for example, the Wigner function at s = 0.36). In the high-power 
regime (s —> ©), the field becomes a statistical mixture of the n = 0 
and n= 1 Fock states and the steady-state phase dependence is lost 
completely. 

We have shown that resonance fluorescence from a two-level sys- 
tem can comprise a single photon stream with below-vacuum 
quantum fluctuations of the field. While this appears counter-intuitive, 
owing to the impossibility of associating a well-defined phase to single 
photons, the probabilistic nature of coherent photon scattering in the 
weak excitation regime allows the emitted photons to be in a coherent 
superposition of Fock states |0) and |1). The emergence of phase cor- 
relations in this regime endows resonance fluorescence with the coex- 
istence of photon antibunching and quadrature squeezing. Our 
simultaneous observation of these two phenomena can therefore be 
interpreted as a quantum mechanical manifestation of the comple- 
mentary particle and wave natures of light, respectively, with no clas- 
sical analogues. 


Online Content Methods, along with any additional Extended Data display items 
and Source Data, are available in the online version of the paper; references unique 
to these sections appear only in the online paper. 


Received 1 April; accepted 23 June 2015. 
Published online 31 August 2015. 


1. Schuda, F., Stroud, C. R. Jr & Hercher, M. Observation of the resonant Stark 
effect at optical frequencies. J. Phys. B 7, L198-L202 (1974). 

2. Kimble, H. J., Dagenais, M. & Mandel, L. Photon antibunching in resonance 
fluorescence. Phys. Rev. Lett. 39, 691-695 (1977). 

3. Hoéffges, J. T., Baldauf, H. W., Lange, W. & Walther, H. Heterodyne measurement of 
the resonance fluorescence of a single ion. J. Mod. Opt 44, 1999-2010 (1997). 

4. Walls, D. F. & Zoller, P. Reduced quantum fluctuations in resonance fluorescence. 
Phys. Rev. Lett. 47, 709-711 (1981). 

5. Heidmann, A. & Reynaud, S. Squeezing in the many atom resonance fluorescence 
emitted in the forward direction: application to photon noise reduction. J. Phys. 
(Paris) 46, 1937-1948 (1985). 

6. Lu,Z.H., Bali, S. & Thomas, J. E. Observation of squeezing in the phase-dependent 
fluorescence spectra of two-level atoms. Phys. Rev. Lett. 81, 3635-3638 (1998). 

7. Raizen, M. G., Orozco, L. A. Xiao, M., Boyd, T. L. & Kimble, H. J. Squeezed-state 
generation by the normal modes of a coupled system. Phys. Rev. Lett. 59, 198-201 
(1987). 

8. Ourjoumtsev, A. et al. Observation of squeezed light from one atom excited with 
two photons. Nature 474, 623-626 (2011). 


LETTER 


9. Gerber, S. et al. Intensity-field correlation of single-atom resonance fluorescence. 
Phys. Rev. Lett. 102, 183601 (2009). 

10. Ou, Z.Y., Hong, C. K.& Mandel, L. Detection of squeezed states by cross correlation. 
Phys. Rev. A 36, 192-196 (1987). 

11. Vogel, W. Homodyne correlation measurements with weak local oscillators. Phys. 
Rev. A 51, 4160-4171 (1995). 

12. Yuen, H. P. Two-photon coherent states of the radiation field. Phys. Rev. A 13, 
2226-2243 (1976). 

13. Teich, M. C. & Saleh, B. E. A. Squeezed states of light. Quantum Opt. 1, 153-199 
(1989). 

14. Walls, D. F. Squeezed states of light. Nature 306, 141-146 (1983). 

15. Caves, C.M. Quantum limits on noise in linear amplifiers. Phys. Rev. D 26, 
1817-1839 (1982). 

16. Goda, K. eta/. A quantum-enhanced prototype gravitational-wave detector. Nature 
Phys. 4, 472-476 (2008). 

17. Warburton, R. J. et al. Giant permanent dipole moments of excitons in 
semiconductor nanostructures. Phys. Rev. B 65, 113303 (2002). 

18. Michler, P. et a/, Quantum correlation among photons from a single quantum dot 
at room temperature. Nature 406, 968-970 (2000). 

19. Kim,J., Benson, O., Kan, H. & Yamamoto, Y. A single-photon turnstile device. Nature 
397, 500-503 (1999). 

20. Xu, X. et al. Coherent optical spectroscopy of a strongly driven quantum dot. 
Science 317, 929-932 (2007). 

21. Vamivakas, A. N., Zhao, Y., Lu, C.-Y. & Atattire, M. Spin-resolved quantum-dot 
resonance fluorescence. Nature Phys. 5, 198-202 (2009). 

22. Flagg, E. B. etal. Resonantly driven coherent oscillations in a solid-state quantum 
emitter. Nature Phys. 5, 203-207 (2009). 

23. Akopian, N. etal. Entangled photon pairs from semiconductor quantum dots. Phys. 
Rev. Lett. 96, 130501 (2006). 

24. Young, R. J. et al. Improved fidelity of triggered entangled photons from single 
quantum dots. New J. Phys. 8, 29 (2006). 

25. Miller, M., Bounouar, S., J6ns, K. D., Glassl, M. & Michler, P. On-demand generation 

of indistinguishable polarization-entangled photon pairs. Nature Photon. 8, 

224-228 (2014). 

26. Matthiesen, C., Vamivakas, A. N. & Atatiire, M. Subnatural linewidth single photons 

rom a quantum dot. Phys. Rev. Lett 108, 093602 (2012). 

27. Matthiesen, C. et al. Phase-locked indistinguishable photons with synthesized 

aveforms from a solid-state source. Nature Commun. 4, 1600 (2013). 

uhlmann, A. V. et al. Charge noise and spin noise in a semiconductor quantum 

evice. Nature Phys. 9, 570-575 (2013). 

anley, M. J. et al. Dynamics of a mesoscopic nuclear spin ensemble interacting 

ith an optically driven electron spin. Phys. Rev. B 90, 195305 (2014). 

oudon, R. Squeezing in resonance fluorescence. Opt. Commun. 49, 24-28 

984). 


a+ 


ine) 
go 
Qnre 


nN 
yo 
=n 


w 
o 
Ar 


Supplementary Information is available in the online version of the paper. 


Acknowledgements We acknowledge financial support from the University of 
Cambridge, the European Research Council ERC Consolidator Grant Agreement No. 
617985 and the EU-FP7 Marie Curie Initial Training Network S3NANO. C.M. 
acknowledges Clare College Cambridge for financial support through a Junior 
Research Fellowship. We thank E. Clarke, M. Hugues and the EPSRC National Centre for 
Ill-V Technologies for the wafer and C. Baune, R. Moghadas Nia, W. Vogel, G. Rempe, 
H. J. Carmichael and A. Ourjoumtsev for discussions. 


Author Contributions C.H.H.S. and M.A. devised the experiment, C.H.H.S., J.H., AEJ., 
C.M. and C.L.G. performed the experiments, C.H.H.S., J.H. and C.L.G. developed the 
models and analysed the data, all authors contributed to the discussion of the results 
and the manuscript preparation. C.H.H.S. and C.M. processed the quantum dot device. 


Author Information Reprints and permissions information is available at 
www.nature.com/reprints. The authors declare no competing financial interests. 
Readers are welcome to comment on the online version of the paper. Correspondence 
and requests for materials should be addressed to M.A. (ma424@cam.ac.uk). 


10 SEPTEMBER 2015 | VOL 525 | NATURE | 225 


©2015 Macmillan Publishers Limited. All rights reserved 


LETTER 


METHODS 


Interferometer. A frequency- and power-stabilized single-mode laser is used to 
resonantly excite the neutral exciton transition of the quantum dot. The emitted 
photons are collected in a confocal dark-field microscope, where the laser is 
separated from the quantum dot emission by means of two crossed polarizers. 
The second polarizer is implemented as a polarizing beam splitter, which enables 
the use of the attenuated laser field as local oscillator. The light field in the quantum 
dot arm of the interferometer consists typically of <1% laser photons and >99% 
resonance fluorescence photons; we therefore neglect the laser background in 
the quantum dot photon mode. Likewise, any quantum dot photons in the 
local oscillator output can be neglected because the excitation laser intensity 
before attenuation is several orders of magnitude larger than the resonance fluor- 
escence intensity. In the fringe measurement in Fig. 1a, the spatial path difference 
Af ~ 11cm is kept constant while laser and quantum dot frequency are tuned 
continuously to change the interferometer phase. This form of phase control is 
enabled by the tunability of the quantum dot resonance via the DC Stark effect'” 
and increases the long-term stability of the interferometer, which contains no 
moving parts. The visibility of the interferometer for high-power laser light 
amounts to near unity, but is reduced in Fig. 1b to 73.8%, owing to incoherent 
photon emission as well as an inadvertent mismatch of resonance fluorescence and 
local oscillator intensities. Additionally, the visibility is reduced for low count 
rates’, which makes the use of a bright single photon source and high photon 
collection efficiencies crucial for our experiments. The collection efficiency of our 
optical setup is 1%. This value is calculated from the obtained count rates and the 
0.58-ns excited-state lifetime of the quantum dot used in this work. 

Post-selection. The intensities of resonance fluorescence and local oscillator fields 
are kept equal in all of our measurements. To ensure the absence of laser photons 
in the quantum dot mode, the laser background is measured once a minute for 2 s. 
To this end, the local oscillator path is blocked and the quantum dot, which is 
embedded in a Schottky diode structure, is tuned off resonance via the quantum 
confined Stark effect (QCSE)'’. Furthermore, the intensity of the quantum dot 
emission is monitored continuously during the measurements to detect spectral 
wandering of the quantum dot transition**”’. This is done by filtering out the 
phonon sideband (PSB) and detecting it on a third single photon detector. The 
measured correlation histograms ( Ge (7) are saved once a minute. We perform a 


post-selection of histograms with a threshold on the mean PSB count rate and 
another threshold on the measured laser leakage in the quantum dot arm. In the 
experiments shown in Fig. 2, the laser is kept on resonance but the relative phase is 
not actively controlled in the interferometer. Instead, phase-dependent measure- 
ments are performed by using the individual detector intensities as a measure of 
the interferometer phase and relying on the wandering of the phase due to tem- 
perature drifts on timescales of typically 230 min z~'. To bin the data, we per- 
form a reference measurement of the interference fringes by scanning the laser 
frequency while keeping the quantum dot on resonance using the QCSE. An 
example measurement of the interference fringes obtained in this way may be 
seen in Fig. 1b. To have equal sized phase bins, we use intensity bins of varying size 
that are proportional to the derivative of cos(¢/2). This phase binning is sign- 
invariant, that is, it cannot distinguish between positive or negative phases and 
therefore bins data into values between 0 and +7. This does not affect our mea- 
surement because all correlation functions are symmetric in phase around 0. The 
data points shown at negative phases in Fig. 2b are measured between ¢ = 0.57 
and ¢ = n, and have been shifted by —1. 

Reduction of measured degree of squeezing. Although the conditioning nature 
of the measurement should render it robust against low detection efficiencies, this 
is not true in practice, and several effects reduce the measured degree of squeezing 
compared to the theoretical limit for resonance fluorescence. Low photon num- 
bers in the interferometer reduce the fringe visibility, but this only affects the 
signal-to-noise ratio in our measurements. However, low count rates also lead 
to higher shot noise which can increase the error in the phase binning protocol. 
This leads to a decrease in the detected degree of squeezing by introducing mixing 
between quadratures. Other sources of phase noise include spectral wandering of 
the quantum dot transition leading to fluctuations in the dipole phase, and any 
interferometric instability on timescales shorter than the histogram saving time. 
Finite timing resolution of the correlation setup also leads to a decreased visibility 
of the features at zero time delay, and further reduces our measured value for 
squeezing. We have independently measured the timing resolution of the Hanbury 
Brown and Twiss correlation setup with a mode-locked pulsed laser source (<3-ps 
pulse width) for different mean count rates. The extent of phase noise from 
different sources is harder to quantify and is used as a fitting parameter in the 
theoretical curves in Fig. 2. 
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The most incompressible metal osmium at static 
pressures above 750 gigapascals 
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Metallic osmium (Os) is one of the most exceptional elemental 
materials, having, at ambient pressure, the highest known density 
and one of the highest cohesive energies and melting tempera- 
tures’. It is also very incompressible’ *, but its high-pressure beha- 
viour is not well understood because it has been studied’ so far 
only at pressures below 75 gigapascals. Here we report powder 
X-ray diffraction measurements on Os at multi-megabar pressures 
using both conventional and double-stage diamond anvil cells’, 
with accurate pressure determination ensured by first obtaining 
self-consistent equations of state of gold, platinum, and tungsten 
in static experiments up to 500 gigapascals. These measurements 
allow us to show that Os retains its hexagonal close-packed struc- 
ture upon compression to over 770 gigapascals. But although its 
molar volume monotonically decreases with pressure, the unit cell 
parameter ratio of Os exhibits anomalies at approximately 150 
gigapascals and 440 gigapascals. Dynamical mean-field theory cal- 
culations suggest that the former anomaly is a signature of the 
topological change of the Fermi surface for valence electrons. 
However, the anomaly at 440 gigapascals might be related to an 
electronic transition associated with pressure-induced interactions 
between core electrons. The ability to affect the core electrons 
under static high-pressure experimental conditions, even for 
incompressible metals such as Os, opens up opportunities to search 
for new states of matter under extreme compression. 

The platinoid 5d transition elements Re, Os, and Ir are the densest 
and stiffest metals’’. Although a short-lived claim* that Os is stiffer 
than diamond? was subsequently disproven’, there remains scientific 
interest in the high-pressure behaviour of Os: the bulk modulus values 
measured by different groups vary substantially (395-435 GPa)***°°, 
and controversy surrounds reports of a possible pressure-induced iso- 
structural Lifshitz’® transition (also called an electronic topological 
transition, ETT) in Os. The ETT arises when distortion of the elec- 
tronic band structure by an external perturbation results in a topo- 
logical modification of the Fermi surface. 

Os has a hexagonal close-packed (hcp) structure, with two unit cell 
parameters (a and c) fully defining the atomic arrangement at a given 
pressure. The observation’ of an anomaly in the compressibility and 
pressure dependence of the c/a ratio around 25 GPa was attributed to 
an ETT, but subsequent experimental studies”** at pressures to about 
60 GPa failed to detect anomalies and found®"’ instead that texturing 
or non-hydrostatic conditions can greatly complicate the interpreta- 
tion of X-ray diffraction data; indeed, experimental artefacts may 
mimic isostructural transitions. Theoretical studies of Os have so 
far also resulted in an inconsistent picture of its high-pressure beha- 


viour. Reports of possible single or multiple anomalies in c/a ratio at 
pressures ranging from 9 GPa to over 25 GPa have been both attributed* 
and not attributed’* to ETT; one study’* found evidence for multiple 
ETTs at pressures up to 130 GPa that did not affect the compressional 
behaviour of Os, and a further study’* concluded that there are no 
peculiarities in the pressure-driven evolution of the atomic and elec- 
tronic structure of Os. The inconsistencies in the theoretical high- 
pressure behaviour of Os mirror the difficulties encountered when 
probing the high-pressure behaviour of 3d hcp metals such as Zn and 
é-Fe (refs 15-19). These difficulties add further interest to detailed 
studies of the 5d element Os, to allow a broader comparison of the 
crystal chemistry of hcp metals. 

Although Os has been experimentally studied*** at pressures up to 
about 75 GPa, this pressure range is far too narrow to explore the 
behaviour of one of the most incompressible metals. Data collected 
at multi-megabar pressures are highly desired, and can be obtained 
using the double-stage diamond anvil cell (dsDAC) technique’, which 
generates the required ultra-high static pressures. Under such condi- 
tions, pressure determination is based on the equation of state (EOS) of 
one or several standards mixed with the sample being studied (see 
Methods). The absolute accuracy of the EOS measurements—which 
is particularly important when aiming to compare experiment against 
theory—cannot be higher than the accuracy of the static pressure scale. 
Shock-wave and ramp compression experiments’ achieve tera- 
pascal pressures, but only at high temperatures. The EOSs obtained 
in experiments below 100 GPa, using different dynamic and static 
methods and for different standards, tend to agree to within 
2-3 GPa, but discrepancies increase with pressure and frequently 
reach unacceptable levels of the order of tens of gigapascals at a pres- 
sure around 0.5 TPa (Fig. 1, Extended Data Fig. 1, examples in ref. 22). 
Much more accurate results are obtained when using standards with 
internally consistent EOSs, especially if the materials used as pressure 
markers have different (or, even better, contrasting) elastic prop- 
erties*”*, as do gold, platinum, and tungsten. 

We conducted experiments in conventional and dsDACs on Au-Pt 
mixtures at pressures up to 500 GPa, and on Au-W and Pt-W mix- 
tures up to approximately 200 GPa (Fig. 1, Extended Data Figs 1, 2). 
With Au as the pressure marker’, we fitted the pressure-volume 
(P-V) data of Pt using the third-order Birch-Murnaghan (Fig. 1) 
and the Vinet EOSs (Extended Data Table 1). Both EOSs provide 
equally good fits, yielding parameters for Pt very close to those 
obtained from shock-wave data”® (Fig. 1, Extended Data Table 1). Pt 
and Au were used as pressure markers in two independent powder 
X-ray diffraction experiments to study the EOS of tungsten (Extended 
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Germany. °European Synchrotron Radiation Facility, BP 220, Grenoble F-38043, France. Swedish e-Science Research Centre (SeRC), Linképing University, SE-58183 Linképing, Sweden. ’Department of 
Physics, Chemistry and Biology (IFM), Linképing University, SE-58183 Linképing, Sweden. ®Centre de Physique Théorique, CNRS, Ecole Polytechnique, 91128 Palaiseau, France. °Radboud University, 
Institute for Molecules and Materials, Heyendaalseweg 135, 6525AJ Nijmegen, The Netherlands. ‘°Department of Theoretical Physics and Applied Mathematics, Ural Federal University, Mira street 19, 
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Figure 1 | The dependence of the unit cell volume of Pt on pressure P. 
Experimental data points (different symbols correspond to different runs, Au 
was used as a pressure marker in all experiments, error bars show standard 
deviations) were fitted using the third-order Birch- ~Murnaghan | EOS (red line, 
volume at ambient temperature and pressure Vp = 60.389(3) A? per unit cell, 
EOS parameters K3o9 = 274(2) GPa, K’ = 5.23(3)). The blue, green, and pink 
lines are from refs 20, 22, and 24, respectively. 


Data Figs 1, 2 and Extended Data Table 1). These experiments yield 
P-V data that are in remarkable agreement with each other and 
provide an accurate EOS of W for pressures up to 200GPa 
(Extended Data Fig. 1 and Extended Data Table 1). Our data are in 
close agreement with the EOS of W obtained using a thermodynamic 
assessment”’, but disagree with extrapolations of the EOSs above about 
100 GPa derived from experiments below 1 Mbar (Extended Data 
Fig. 1). These results confirm the necessity of EOS calibration well 
above 100 GPa for ultra-high-pressure studies and provide (Extended 
Data Table 1) a self-consistent set of EOSs for Au, Pt, and W that we 
use in our experiments with Os. 

We studied the compressional behaviour of Os using powder 
X-ray diffraction experiments in conventional DACs at pressures 
up to about 200 GPa that were conducted in Ne (Au and W pressure 
markers, Extended Data Fig. 2) or He (Pt pressure marker). Similar 
experiments in a dsDAC reached a pressure of 774(10) GPa (accord- 
ing to the Os EOS; see below), where the number in parentheses is 
one standard deviation; Os adopted the hcp structure even at this 
pressure (Fig. 2). The only successful experiments in dsDACs with a 
pressure marker were using the Os-W mixture (Fig. 2), where we 
achieved a pressure slightly above 500 GPa. The data collected in 
different runs are consistent (Fig. 3), and the dependence of the unit 
cell volume on pressure does not show any obvious anomaly and is 
well described by the third-order Birch-Murnaghan (Fig. 3a) or 
Vinet EOSs (Extended Data Table 1). The bulk modulus (399(6) 
GPa fitted with the third-order Birch-Murnaghan EOS) is in agree- 
ment with the results of refs 2 (Fig. 3a) and 3 (Extended Data Fig. 3). 
We do, however, observe two anomalies at approximately 150 GPa 
and 440 GPa in the ratio of the lattice parameters, c/a, when com- 
pressing Os (Fig. 3b). Although these are at the detection limit of 
our experimental set-up, the anomaly at about 150 GPa was repro- 
duced in three independent runs, and the one above 400 GPa in two 
runs. Fitting of the P-V data by the third-order Birch-Murnaghan 
EOS within three pressure ranges—below 120 GPa, between 170 GPa 
and 400GPa, and above 400 GPa—gives an interesting result: 
although fitting over the first and second pressure ranges produces 
similar EOS parameters (K3o9 = 397(3) GPa, K’ =4.07(4) and 
K3o9 = 416(5) GPa, K’ = 3.8(1), respectively, where the subscript 
indicates room temperature and the prime denotes differentiation 
with respect to pressure), we obtain different parameters at pressures 
above 400 GPa (K399 = 293(5) GPa, K' = 5.4(1)). These experiment- 
ally observed peculiarities are not artefacts and require an explana- 
tion: substantial changes of compressibility with pressure are quite 
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Figure 2 | Diffraction patterns of the samples compressed in dsDACs. 
Lower curve, mixture of Os (a = 2.3232(3) A, c = 3. 7099(11) A) and W 

(a = 2.6649(2) A, pressure marker); X-ray wavelength of 0. 3344 A. Upper 
curve, Os (a = 2.2488(2 )A,c=3. 555(2) A, pressure determined from our Os 
EOS); X-ray wavelength of 0.2898 A. Reflections of highly textured Re at 
approximately 110 GPa are due to the Re support of the secondary anvil. 
Experimental data are shown by red dots; continuous blue curves are 
simulations using the full-profile (GSAS) software. @ is the diffraction angle; 
the labels on the peaks indicate the Miller indices of the corresponding 
diffraction reflection of the given metal; diffraction lines of secondary anvils are 
not visible; the pressures given above the curves designate at which pressure in a 
dsDAC the diffraction patterns were collected. 


common for materials undergoing pressure-induced spin crossover”, 
but are not expected for heavy 5d elements such as Os. 

The behaviour of the c/a ratio of Os suggests that an ETT is the most 
likely explanation for the observed peculiarities’. In first-principles 
electronic structure calculations for Os at different compressions in 
the framework of dynamical mean-field theory” (see Methods), we 
observe two ETTs at I’ and L points of the hcp Brillouin zone 
at pressures of approximately 100GPa and 180GPa, respectively 
(Extended Data Figs 4, 5). This result agrees with the pressure range 
in which the first c/a ratio anomaly appears experimentally (Fig. 3b). 
The behaviour of the bands at I’ and L points for increasing pressure is 
remarkably similar to that observed recently in the isoelectronic hcp 
Fe (ref. 19). We observe a rather strong influence of many-electron 
effects beyond the local density approximation within density func- 
tional theory (DFT) on the band structure at I and L points even 
though Os can be classified as a weakly correlated metal (Extended 
Data Fig. 4). The inclusion of correlations between 5d electrons moves 
the ETTs at these high symmetry points to higher pressure (Extended 
Data Fig. 5), improving the agreement with experiment. However, 
increasing the pressure further does not lead to any new ETTs up to 
a pressure P = 477 GPa (Extended Data Fig. 6), even when including 
the spin-orbit interaction (Methods and Extended Data Figs 7, 8). 
Consequently, the origin of the second anomaly in the c/a ratio seen 
in Fig. 3b remains unknown. 

In solid metals, the outermost (valence) electrons are no longer 
associated with their respective atoms and instead form electronic 
bands that bond the atoms together. Because the inner-core electrons 
remain tightly bound to their nuclei and do not contribute to the 
bonds, they are often considered irrelevant when determining the 
properties of the material. But compression increases the overlap 
between the electronic clouds, as seen in the plot of the electronic 
density of states (DOS) of Os in Fig. 4, where the low-lying localized 
5p and 4f states start to interact with each other at P = 392 GPa. This 
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Figure 3 | Experimental dependence of the unit cell volume and the ratio of 
the lattice parameters c/a of Os on pressure P. a, b, Different symbols 
correspond to different pressure markers in different runs (solid magenta dots, 
Auand W pressure markers; green stars, Pt marker; red diamonds, W pressure 
marker; blue square, Os itself; error bars show standard deviations). 
Experimental data points were fitted using the third-order Birch-Murnaghan 
EOS (blue line, Vo = 28.02(4) Be per unit cell, K3o9 = 399(6) GPa, 

K' = 4.04(4)). The magenta and green lines are from refs 3 and 4, respectively. 


pressure for the core-level crossing (CLC) transition is in good agree- 
ment with the pressure at which we see the c/a ratio anomaly in our 
experiments, suggesting that interactions between 5p and 4f states 
might cause the observed peculiarity. 

The effect of the CLC transition on thermodynamic properties must 
be indirect. Using the pseudopotential transformation”, one can 
establish a relation between the smooth part of the valence orbitals 
Wy, which determine the bonding, and the core orbitals Wj. The former 
are solutions of the Schrédinger-like equation 


[- v4 0] iain (1 


where fi is the reduced Planck constant, m is the mass of the electron, 


r is the soe vector, ¢/ are the eigenvalues of the orbitals, 


1 
and V=V+V°% is the pseudopotential, with V an effective potential 
that acts on the electron. The non-local operator V® acts upon Wy 


according to 


PI) = D> (8-8) WEG) (2) 


J 


Well-separated and atomic-like core states are often considered ‘fro- 
zen’, with no pressure-dependent influence on the valence states; this 
explains the success of modern pseudopotential approaches when 
studying matter under extreme conditions. But substantial reconstruc- 
tion of inner 5p and 4f states can affect chemical bonding and thereby 
structural properties of solids via modification of the non-local operator 


VR in equation (2). 
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Figure 4 | Calculated electronic density of states (DOS) of Os as a function 
of energy E (relative to the Fermi energy E;). a, b, DOS at pressures of 0 GPa 
(a) and 392 GPa (b). Experimental lattice parameters were used in the 
calculations. 6s, 6p, 5d, and 5f electrons form well-defined bands near the Fermi 
energy at all the pressures examined in this study. 5p electrons occupy 5pi/2 
and 5p3,/2 states, which are split owing to spin-orbit interaction; 4f electrons 
occupy 4fs/2 and 4f,/. states. They behave as core electrons forming fully 
localized states at P= 0 GPa. However, at P = 392 GPa, the 5p states are 
broadened. Importantly, one clearly sees in b that 5p3/. and 4f,/. states start to 
interact with each other, and the CLC transition takes place. This interaction 
might be responsible for the peculiarity observed in our experiments at 
ultra-high pressure. 


Even in the simplest model of polarized ionic cores of small radius, 
where the pseudopotential reduces to a local operator, its Fourier 
component v,,(q) is”” 
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where Z is the ionic charge, q is the wave vector of the Fourier com- 
ponent of the pseudopotential, vc(q) = 4me”/q’ is the Fourier compon- 
ent of the Coulomb interaction, kis the electron wave vector, é, is the 
dispersion relation for the conduction electrons, w, is their plasma 
frequency, and «(ia) is the ion core polarizability at imaginary fre- 
quencies. In this model, reconstruction of the inner states affects the 
electron-ion interaction via the function «(iw); Os under ultra-high 
pressure seems to provide the first real example of this kind of effect. In 
a sense this effect resembles the well-known phenomenon of atomic 
collapse**, where, as one moves across the periodic table, f electron 
states move from their outer effective potential well to the inner well 
(which explains the existence of rare-earth and actinide groups). 
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Although the effects we document are much weaker, the unavoidable 
strong mixing of 4f and 5p states at the CLC changes atomic polariz- 
ability and therefore the potential of the ion-electron interaction. 
Figure 4b shows that the overlap of 5p states increases with pressure, 
which equates to an increase of their effective radius. This should make 
their contribution to the ionic polarizability larger, and translate the 
effect of the CLC—a transition that is expected to be common, at least 
for the transition metals of the sixth period—to the valence electrons. 

Relating electronic transitions to lattice parameter anomalies is gen- 
erally nontrivial (Methods). For instance, a conventional ETT caused 
by a Fermi surface topology change need not result in a visible anomaly 
in the pressure dependence of lattice parameters at temperature T = 0 
(Extended Data Figs 9, 10, Extended Data Table 1) because the ETT 
gives rise to a pronounced anomaly in the third derivative of the 
thermodynamic potential and induces some kinks in the second deriv- 
ative (this is the reason it is often called “the second-and-a-half”-order 
transition’®). But the ETT gives rise to a divergence in the thermal 
expansion, resulting in an observable anomaly in the c/a ratio at finite 
temperatures (Methods). A similar situation may exist regarding the 
new type of electronic transition observed in our experiments at ultra- 
high pressure, even though the effect of the crossing of deep 4f and 5p 
levels should be weaker than the effect arising from the Fermi surface 
topology modification; the experimentally observed anomaly in the c/a 
ratio at approximately 440 GPa is weaker than the one at approxi- 
mately 150 GPa (Fig. 3b). 

Our findings demonstrate that extreme compression can change the 
nature of core electrons. This effect has been examined in static high- 
pressure experiments for soft simple metals like Li (ref. 29) and Na (ref. 
30), but recent ramp compression experiments on diamond’ have 
indicated that qualitatively new static pressure levels are needed to 
affect the core electrons of less compressible transition metals or cova- 
lently bonded materials. By compressing Os, one of most incompress- 
ible metals, to over 770 GPa, we were able to access this regime and 
observe a new type of electronic transition, the CLC transition, that 
involves pressure-induced interactions between core electrons, and 
leads to observable changes of the material properties. We believe that 
the ability to reach sufficiently high pressure levels to affect the core 
electrons of transition metals in static high-pressure experiments will 
open up opportunities in the search for new states of matter. 


Online Content Methods, along with any additional Extended Data display items 
and Source Data, are available in the online version of the paper; references unique 
to these sections appear only in the online paper. 
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METHODS 

Materials. Au (99.9995% purity, 1-u1m particle size), Pt (99.999% purity, 2-j1m 
particle size), W (99.999% purity, 1-jum particle size), and Os (99.99% purity, 
1-,1m particle size) powders were purchased from Goodfellow Inc. 

Diamond anvil cell experiments. For all our experiments we used piston-cylinder 
BX90 type DACs with a large optical aperture produced at BGI. Diamonds with 
culet sizes of 250 1m or 300 1m were used in conventional DAC experiments 
below 70 GPa or as primary anvils in double-stage DACs. Compression of Au, 
Pt, W, and Os powders in a He or Ne pressure medium above 100 GPa was 
conducted with bevelled diamonds with 120-1m culets. The secondary anvils were 
produced by direct conversion of glassy carbon balls with diameters of 10-20 pm. 
Re or Ir were used as gasket materials. Gaskets were indented to a thickness of 
20-35 tum in different experiments and holes with diameters of 125-150 j1m were 
drilled into the centre of the indentation. As a pressure transmitting medium, He 
and Ne was loaded at pressures of 1.2-1.4 kbar; in some experiments with double- 
stage DACs, liquid paraffin wax was used. Cavities in diamond anvils for experi- 
ments with double-stage DACs were made by a picosecond pulsed laser. 

X-ray diffraction measurements. We conducted in situ X-ray high-pressure 
experiments at the Bayerisches Geoinstitut (Germany), at ID09 at ESRF 
(France), at ECB at PETRA III (Germany), and at 13-IDD at Advanced Photon 
Source, APS (USA). At the Bayerisches Geoinstitut, we obtained powder X-ray 
diffraction data with a system consisting ofa Rigaku FRD high-brilliance generator 
(90 kW) and APEX CCD Area Detector. The MoK, radiation (tube voltage 60 kV, 
tube current 55 mA, cathode gun 0.1 X 0.1 mm”) was focused with MaxFlux X-ray 
optics and further collimated down to a FWHM beam size of about 30 jum. At the 
ID09 at ESRF, the data were collected with the MAR555 detector using an X-ray 
beam with a wavelength of approximately 0.41 A and a beam size down to 
5x5 um?. At the ECB at PETRA III, the data were collected with the Perkin 
Elmer detector using an X-ray beam with a wavelength of approximately 0.29 A 
and a beam size as small as 1.5 X 1.5 uum? At the 13-IDD station (GSECARS), we 
used a MAR-165 CCD area detector and a highly focused beam (about 3 X 4 tm”) 
with a wavelength of 0.3344 A. The collected images were integrated using the 
Fit2D or GADDS programs to obtain a conventional diffraction pattern. Data 
analysis was conducted using the GSAS package. 

Computational details. To include electronic correlation effects for the partially- 
filled Os 5d band beyond the standard DFT framework we used an local density 
approximation + dynamical mean-field theory (LDA+DMEFT) approach”**'. 
This approach is based on a full-potential linear augmented plane-wave + local 
orbitals technique as implemented in the Wien2k code“ in conjunction with the 
DMFT implementation provided by the TRIQS package*!**. Our LDA+ DMFT 
framework is fully self-consistent in the charge density. The LDA+ DMFT calcu- 
lations were performed within the scalar-relativistic approximation and using a 
k-mesh with 32 X 32 X 32 points in the full Brillouin zone. The spin-orbit coup- 
ling was not included because LDA calculations show that it has a negligible 
effect on the electronic structure in the vicinity of the Fermi level. The DMFT 
quantum impurity problem was solved using the numerically-exact imaginary- 
time hybridization-expansion continuous-time quantum Monte Carlo (CT- 
QMC) method**. A large number of Monte Carlo cycles, more than 512 million, 
were performed to obtain a well converged DMFT local self-energy. We adopted a 
stochastic version of the maximum entropy method” for the analytical continua- 
tion of the CT-QMC self-energy to the real frequency axis. For the Coulomb 
interaction strength U and Hund’s coupling constant J we used the values 
U=2.8eV and J= 0.55 eV that are estimated in ref. 37. The qualitative results 
of our LDA+ DMFT calculations are not very sensitive to the exact values of Uand 
J. We used the ‘around mean-field form’** for the double counting correction, 
which is suitable for weakly correlated metallic systems. 

In calculations of band structure at the level of DFT” within the LDA or semi- 
local generalized gradient approximation (GGA), we used two complementary 
methods, the full potential (linear) augmented plane waves + local orbitals 
method as implemented in the Wien2k code* and the electronic-structure 
method" RSPt. Both are all-electron methods, which do not impose any approx- 
imations on the shape of the one-electron potential, and they are known to gen- 
erate very similar results. The former method allows us to directly compare the 
LDA and LDA+DMFT results. These methods are particularly suited to high- 
pressure calculations because the basis functions for any energy, including nom- 
inally deep core states, can be treated as ‘valence’ states. 

For calculations with the Wien2k code, we used a k-mesh consisting of 
32 X 32 X 32 k-points in the full Brillouin zone. The size of the plane-wave 
basis set is given by the cutoff parameter Kmax. In our calculations, we kept the 
product between Kyax and the radius of the real-space muffin-tin spheres to 
Kmax X Rurr = 10. At pressures of 0 GPa, 134GPa, 247 GPa, and 477 GPa, we 
set Rr = 2.5 atomic units (a.u.), 2.34a.u., 2.27a.u., and 2.16a.u., respectively. 


Wien2k LDA results give the most direct comparison to our LDA+DMFT results, 
because same computational scheme at the LDA level is used. 

To calculate the band-projected density of states (DOS) we used RSPt*’. RSPt is 
an all-electron, full-potential electronic structure method that uses a basis of site- 
centred spherical waves (a generalization of augmented muffin-tin orbitals) in the 
self-consistent Kohn-Sham formalism” to calculate the electron density and total 
energy. We used LDA, as well as two gradient corrected functionals, AM05**° and 
PBE**—well-known examples of the accuracy DFT can achieve*’. We used RSPt to 
predict the electron DOS of Os in the hexagonal close-packed (hcp) structure at 
pressures up to about 700 GPa. At these pressures, we find that a basis corres- 
ponding to 4f, 5s, 5p, 5d, 5f, 6s, and 6p atomic states is sufficiently complete; adding 
4d functions to the valence yields negligible changes in the calculated DOS. Bases 
are scalar-relativistic: the spin-orbit interaction is included variationally. 

Calculations of band structure and DOS were carried out at the experimental 

values of the lattice parameters for each value of pressure. For a few lattice para- 
meters, the calculations were carried out using both the Wien2k code with LDA, 
and the RSPt code with GGA-PBE; the obtained DOS are very similar. The elec- 
tronic structure calculated for a fixed lattice parameter is known to be quite 
insensitive to the use of LDA or GGA in the calculations*®. When calculating 
the c/a ratio using RSPt, enthalpy was optimized at fixed pressure on a grid of 
pressures. Using Wien2k, energy was minimized at fixed volume. 
Influence of correlation effects. To underline the importance of correlation 
effects in calculations of the electronic structure of Os, we compare the results 
of LDA+DMFT and LDA calculations in Extended Data Figs 4 and 5. The dif- 
ference is quite noticeable. In our LDA calculations, we do not observe any ETTs at 
the I’ point as pressure increases, because at 0 GPa the corresponding band is 
already well above the Fermi level. This result is in contrast to that of ref. 13, where 
an ETT was found at this point, but in agreement with ref. 14, in which no ETT was 
reported. This discrepancy is due to the different values for the lattice constant that 
were used in the LDA calculations. We used experimental room-temperature 
values of the lattice parameters. Using the LDA lattice constants from ref. 13, 
we recover the band energy below the Fermi level at the I point. Assuming 
the GGA lattice constants from ref. 14, we reproduce the results of this work at 
the I point. 

Our LDA calculations also predict that the L-point ETT occurs at a much smaller 
pressure than our LDA+DMFT calculations, around 100 GPa (Extended Data 
Figs 4 and 5). In ref. 14, this band at the L point was predicted to be just below 
the Fermi level at 129 GPa; no ETTs are reported in this pressure range. Using the 
same lattice constants as in ref. 14, we reproduce these results within LDA. We also 
find that along the L-H line, the band energy is just above the Fermi level, but this 
part of the Brillouin zone is not shown in ref. 14. Thus, we attribute discrepancies 
between the LDA- and GGA-based studies to differences in the EOS rather than to 
the exchange-correlation potential. 

The discussion above shows that the electronic structure at the I and L points is 

quite sensitive to volume changes, and that the occurrence, as well as the position 
of ETTs in the LDA/GGA picture, depends sensitively on the accuracy of the 
assumed EOS. The accuracy of the calculated EOS in Os depends on the approxi- 
mation for the electron-electron interactions used in calculations, as is discussed 
below. In view of this uncertainty, the most reliable description of the electronic 
structure is obtained using the experimentally measured lattice parameters**. We 
adopted this strategy, and show all the electronic structure plots at the experi- 
mental lattice parameters. We did not detect any substantial difference between 
the electronic structure calculated with the Wien2k code and with RSPt methods. 
Influence of relativistic effects. Because Os is a heavy element, the importance of 
the spin-orbit coupling (SOC) should be investigated. Using LDA, we calculated 
the band structure in both the scalar-relativistic approximation and with the 
inclusion of SOC using the Wien2k code”; the results are shown in Extended 
Data Figs 7 and 8. Some of the bands are split as a result of the inclusion of SOC. 
However, no new features are seen in the immediate vicinity of the Fermi level. In 
both cases, we find that one ETT has already taken place at the L point of the hcp 
Brillouin zone at a pressure of 134 GPa (Extended Data Fig. 7). We do not see any 
new ETTs upon increasing the pressure to 477 GPa (Extended Data Fig. 8). 
Instead, we see that the agreement between LDA and LDA+DMFT improves at 
high pressure, as expected, because the importance of correlation effects decreases 
with increasing pressure. This observed agreement indicates the internal consist- 
ency of our calculations. 
Calculated equation of state. The calculations of the EOS and the lattice para- 
meters using the LDA+DMFT approach are very time consuming, and their 
numerical accuracy is insufficient to distinguish weak peculiarities of the lattice 
parameters, such as the c/a ratio’. We therefore focus on the results obtained 
within the LDA and GGA of the DFT, and compare our results with experiment, as 
well as with data available in the literature (see Extended Data Fig. 9 and 10 and 
Extended Data Table 1). 
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Relationship between electronic transitions and anomalies in lattice para- 
meters. Let us first consider an ETT due to the change of the Fermi surface 
topology. Although we have shown the importance of many-electron effects for 
hep Os, they mainly influence band positions at the and L points, while the metal 
remains weakly correlated. Thus, we can use the one-electron picture for a qual- 
itative discussion. For three-dimensional systems the main effect of correlations 
on the ETT is the change of coefficients at the singularities**. The character of the 
anomalies due to the ETT is different at low temperatures (lower than typical 
phonon energies) and at high ones. The initial anomaly is in the DOS at the Fermi 
energy, which within the one-electron picture is a square root singularity, for 
example, SN(Ep)< \/z0(z), where z is the distance between the Fermi energy Ey 
and the Van Hove singularity Ec (z = Ep — Ec), and 0(z) is the Heaviside func- 
tion’®. In the case of an appearance of a new hole pocket’” below the critical volume 
Vert, the change in the DOS is 5N(Eg) ~ (Verr — Vv)”. The anomaly yields a 
sharp peculiarity in the third derivative of the thermodynamic potential Q, and 
induces some kinks in the second derivative. However, it does not necessarily lead 
to a visible peculiarity of the pressure dependence of the lattice parameters at 
T= OK, in agreement with our calculations (Extended Data Fig. 10). Still, in 
hep metals the effect of the ETT on the lattice parameters can be detected experi- 
mentally at finite temperature, owing to the anisotropy of the thermal expansion 
coefficients «, and %, along the c and a directions of the crystal lattice, respectively. 

Indeed, «, and «, can be evaluated from the phonon Feb" and the electron F! 


contributions to the free energy of the hcp metal***°: 
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where F* = FPh" +4 Fel T is the temperature, V is the volume, de, = dln(V), 
de, = dln(c/a), and the coefficients Bj are defined via the hcp elastic constants 
Cj according to 


2 1 
By = 3 (a: + Cy + 3 O33 +20.) 


2 
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Thus, the anisotropy of the thermal expansion is an inherent property of an hcp 
metal, but it should become stronger at the point of the ETT. The contribution due 
to the ETT to the electronic Griineisen parameter and thus to the electronic 
thermal expansion coefficient is proportional to d65N(Eg)/du oc 1/,/z (where u 
is the dilatation) and is divergent at the point of the ETT. However, this contri- 
bution is dominant only at a very low temperatures (typically of the order of 10K 
or below), and phonon contribution should be considered. For acoustic phonons 
with small wave vectors, the Griineisen parameter and thus the thermal expansion 
coefficient are proportional to 0C;;/Ou (typically there is only one shear modulus 
that is especially sensitive to the ETT, and it is the modulus that should be con- 
sidered*'). Because”! 0C;; oc 6N(Eg), the low-temperature phonon thermal expan- 
sion coefficient is as singular as the electronic one. The anomaly in phonon 
frequencies all over the Brillouin zone, except at the vicinity of the I point, is 
determined by the change of effective interatomic potential via the screening 
anomaly, which is weaker and proportional to (—z)*?0(—z) (ref. 48); as a result, 
at high enough temperatures, we expect a square-root singularity in the thermal 
expansion coefficient. 

The temperature at which a crossover occurs from a stronger to a weaker 
singularity depends on peculiarities of the phonon spectra of a particular material. 
The peculiarity of the c/a ratio appears in the experiment as an effect that is 
integrated over the temperature, and therefore the contribution from strong pecu- 
liarities at low temperature should remain visible in room-temperature experi- 
ments. For example, although for hcp Ti the crossover temperature is about 
150 K*, the temperature induced change of the c/a ratio between 0 K and room 
temperature is of the order of 0.5%, judging from the experiment reported”. 
In hep Fe, the behaviour of the bands at the I and L points is very similar to 
the case of Os and the peculiarity of c/a is also visible’’. The difference between hcp 
Fe and Os is that in the former metal, the bands cross the Fermi level almost at the 
same pressure, whereas in the latter, the crossover takes place within a larger 
pressure range. 

A CLC transition discovered in this work may also lead to an observable pecu- 
liarity of the c/a ratio. The anisotropy of the thermal expansion in hcp metals is 
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established in equation (4). Similar to the ETT, the anisotropy might be strength- 
ened by a modification of the non-local pseudopotential, equations (1) and (2), 
acting on the valence electrons, owing to substantial reconstruction of inner 5p and 
4f states at the transition. In the model given by equation (3), reconstruction of the 
inner states affects the electron-ion interaction via the ion core polarizability at 
imaginary frequencies. The derivation of equation (3) is based on the diagram 
consideration of the model of polarized ions in electron gas that is suggested in ref. 
52. All the two-ion contributions to the total energy (that is, all diagrams contain- 
ing two polarizability blocks) can be separated into two classes: one is the van der 
Waals interaction screened by itinerant electrons’; the other, as was demon- 
strated in ref. 50, can be interpreted as a second-order contribution of the pseu- 
dopotential, equation (3), with the only assumption being that the energy of 
excitation of the ionic core is higher than typical energies of the itinerant-electron 
subsystem (such as plasma frequency and the Fermi energy). More details can be 
found in refs 27 and 53. 


31. Georges, A. Kotliar, G., Krauth, W. & Rozenberg, M. J. Dynamical mean-field theory 
of strongly correlated fermion systems and the limit of infinite dimensions. Rev. 
Mod. Phys. 68, 13-125 (1996). 

32. Boehnke, L. etal. Orthogonal polynomial representation of imaginary-time Green’s 
functions. Phys. Rev. B 84, 075145 (2011). 

33. Aichhorn, M. et a/. Dynamical mean-field theory within an augmented plane-wave 
framework: assessing electronic correlations in the iron pnictide LaFeAsO. Phys. 
Rev. B 80, 085101 (2009). 

34. Aichhorn, M., Pourovskii, L. & Georges, A. Importance of electronic correlations for 
structural and magnetic properties of the iron pnictide superconductor LaFeAsO. 
Phys. Rev. B 84, 054529 (2011). 

35. Gull, E.eta/. Continuous-time Monte Carlo methods for quantum impurity models. 
Rev. Mod. Phys. 83, 349-404 (2011). 

36. Beach, K. S. D. Identifying the maximum entropy method as a special limit of 
stochastic analytical continuation. Preprint at http://arxiv.org/abs/cond-mat/ 
0403055 (2004). 

37. Solovyey, |. V., Dederichs, P. H. & Anisimov, V. |. Corrected atomic limit in the local- 
density approximation and the electronic structure of d impurities in Rb. Phys. 
Rev. B 50, 16861-16871 (1994). 

38. Czyzyk, M. T. & Sawatzky, G. A. Local-density functional and on-site correlations: 
the electronic structure of La2CuO, and LaCuO3. Phys. Rev. B 49, 14211-14228 
(1994). 

39. Hohenberg, P. & Kohn, W. Inhomogeneous electron gas. Phys. Rev. 136, 
B864-B871 (1964). 

40. Blaha, P., Schwarz, K., Madsen, G., Kvasnicka, D. & Luitz, J. WIEN2k, An Augmented 

Plane Wave + Local Orbitals Program for Calculating Crystal Properties (Karlheinz 

Schwarz, Techn. Universitat Wien, Austria, 2001). 

Al. Wills, J. M. et al. Full-Potential Electronic Structure Method (Springer, 2010). 

42. Armiento, R.& Mattsson, A. E. Functional designed to include surface effects in self- 

consistent density functional theory. Phys. Rev. B 72, 085108 (2005). 

43. Mattsson, A. E. & Armiento, R. Implementing and testing the AMO5 spin density 

functional. Phys. Rev. B 79, 155101 (2009). 

44. Perdew, J.P., Burke, K. & Ernzerhof, M. Generalized gradient approximation made 

simple. Phys. Rev. Lett. 77, 3865-3868 (1996); erratum. 78, 1396 (1997). 

45. Haas, P., Tran, F. & Blaha, P. Calculation of the lattice constant of solids with 

semilocal functional. Phys. Rev. B 79, 085104 (2009); erratum. 79, 209902 

(2009). 

46. Ruban, A.V. & Abrikosoy, |. A. Configurational thermodynamics of alloys from first- 

principles: effective cluster interactions. Rep. Prog. Phys. 71, 046501 (2008). 

47. Pourovskii, L. V. etal. Impact of electronic correlations on the equation of state and 

transport in e-Fe. Phys. Rev. B 90, 155120 (2014). 

48. Katsnelson, M. |. & Trefilov, A. V. Fermi-liquid theory of electronic topological 

transitions and screening anomalies in metals. Phys. Rev. B 61, 1643-1645 

(2000). 

49. Nizhankovskij, V. |. et a. Anisotropy of the thermal expansion of titanium due to 
proximity to an electronic topological transition. JETP Lett 59, 733-737 (1994). 

50. Souvaizis, P., Eriksson, O. & Katsnelson, M. |. Anomalous thermal expansion in 
alpha-titanium. Phys. Rev. Lett. 99, 015901 (2007). 

51. Vaks, V. G. et al. Pre-transition softening and anomalous pressure dependence of 
shear constants in alkali and alkaline-earth metals due to band-structure effects. 
J. Phys. Condens. Matter 3, 1409-1428 (1991). 

52. Rehr, J. J., Zaremba, E. & Kohn, W. van der Waals forces in the noble metals. Phys. 
Rev. B 12, 2062-2066 (1975). 

53. Vonsovsky, S. V., Katsnelson, M. |. & Trefilov, A. V. Localized and itinerant behavior of 
electrons in metals. Phys. Met. Metallogr. 76, 247-299 (1993). 

54. Dewaele, A. Loubeyre, P. & Mezouar, M. Equations of state of six metals above 94 
GPa. Phys. Rev. B 70, 094112 (2004). 

55. Litasov, K. D. et al. Thermal equation of state to 33.5 GPa and 1673 K and 
thermodynamic properties of tungsten. J. Appl. Phys. 113, 133505 (2013). 

56. Sahu, B. R. & Kleinman, L. Osmium is not harder than diamond. Phys. Rev. B 72, 
113106 (2005). 

57. Dorfman, S. M., Prakapenka, V. B., Meng, Y. & Duffy, T. S. Intercomparison of 
pressure standards (Au, Pt, Mo, MgO, NaCl and Ne) to 2.5 Mbar. J. Geophys. Res. 
117, BO8210 (2012); errata 117, B11204 (2012). 

58. Fei, Y. etal. Toward an internally consistent pressure scale. Proc. Nat! Acad. Sci. USA 
104, 9182-9186 (2007). 


©2015 Macmillan Publishers Limited. All rights reserved 


LETTER 


a b 
32 32 
30 4 —— Dewaele et al., 2004 
a —— Sokolova et al., 2013 a) 
of 28 4 — Litasov et al., 2013 a 30 4 
o 96 —— This work 2 
5 > 284 
— J (o) 
iS 24 S 
BZ 22; 3 264 
S 8 26 
= 20 5 ~ 
- 184 Soa. 
16 5 
' : , : 22 25 1 1 ; 
0 100 200 300 400 500 0 50 100 150 200 


Pressure, GPa Pressure, GPa 


Extended Data Figure 1 | Equations of state of W. a, Comparison of different dots represent data collected using a Au-W mixture, green diamonds using a 
EOSs of W reported in Dewaele et al.**, Sokolova et al.”*, and Litasov et al.°, Pt-W mixture) were fitted using the third-order Birch-Murnaghan EOS (blue 
and this work. Although the curves agree for pressures up to about 50 GPa, solid line, Vo = 31.674(3) A® per unit cell, K3o0 = 307(2) GPa, K’ = 4.53(4)); 


there is a substantial discrepancy for pressures around 0.5 TPa. b, Pressure data are equally well fitted with the Vinet EOS (Vp = 31.686(7) A® per unit cell, 
dependence of the unit cell volume of W. Experimental data points (red solid — K3o9 = 302(1) GPa, K’ = 4.82(3)). 
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Extended Data Figure 2 | Examples of the full-profile (GSAS) treated 
diffraction patterns. a, Au-Pt mixture collected in a dsDAC at 482(5) GPa 
(Au pressure scale from ref. 20). Even at pressures in the proximity of 5 Mbar, 
powder X-ray diffraction data are sufficient to clearly resolve peaks of Au and 
Pt, and accurately determine lattice parameters of both metals. b, W-Pt 
mixture, unsuccessful experiment in a dsDAC. According to the W EOS, the 
pressure is 461(7) GPa, whereas according to the Pt EOS, it is 559(10) GPa. 
This inconsistency in pressures is a result of an inhomogeneous distribution of 
the two metals between secondary anvils in the dsDAC; such data cannot be 
used for constraining EOSs. This observation also shows that very large 
pressure gradients are possible in dsDACs. c, Mixture of Os (a = 2.5404(6) A, 
c = 4,0386(10) A), W (a = 2.9133(5) A), and Au (a = 3.6657(6) A) collected in 
a conventional DAC (Ne used as the pressure transmitting medium) at a 
pressure of 134(2) GPa. 
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Extended Data Figure 3 | Experimental dependence of the unit cell data have been re-fitted using the ruby pressure scale as suggested in refs 20, 22 
volume of Os on pressure in comparison with EOSs reported in the and references therein. Experimental data points (solid red dots) were fitted 
literature. The magenta line is from Occelli et al.* (K399 = 421 GPa, K'= 4.0) using the third-order Birch-Murnaghan EOS (blue line, Vp = 28.02(4) Be per 
and the green line is from Cynn et al.* (K399 = 463 GPa, K’ = 2.8). These unit cell, K399 = 399(6) GPa, K’ = 4.04(4)). 
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Extended Data Figure 4 | Electronic band structure of Os at moderate 
compressions along the high-symmetry lines in the hcp Brillouin zone. 
Energies are given relative to the Fermi energy, which is taken to be zero. 
Calculations are carried out at pressures of 0 GPa (a, b), 97.5 GPa (c, d), and 
134 GPa (e, f). a, c, e, The k-resolved spectral functions A(k, @) obtained 
with LDA+DMFT. b, d, f, The band structure obtained with LDA. In both 
cases, we used the experimental lattice parameters (a = 2.734 A, cla = 1.580 at 
0 GPa; a = 2.578 A, c/a = 1.589 at 97.5 GPa; a = 2.540 A, c/a = 1.590 at 
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134 GPa). Our LDA+ DMFT calculations predict that two ETTs occur 

in hcp Os upon compression. In a, a band at the I point is well below the Fermi 
energy at ambient pressure; however, it nearly touches the Fermi energy at 
97.5 GPa, and the corresponding hole pocket has already appeared at 134 GPa, 
giving rise to a change of the Fermi surface topology, that is, to an electronic 
topological transition at about 101.5 GPa (Extended Data Fig. 5). Our 
LDA+DMFT calculations also indicate that another ETT at the L point should 
occur above 134 GPa, at about 183 GPa (Extended Data Fig. 5). 
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Extended Data Figure 5 | The position of the relevant bands at the I (top) __ the L point. To estimate more precisely the critical pressures for the ETTs, we 
and L (bottom) points obtained from LDA+ DMFT and LDA calculationsas _ plot the positions of the relevant bands at the T’ and L points with respect to 
a function of pressure. The positive values indicate appearance of the the Fermi level and obtain the values of the critical pressures by interpolation. 
corresponding hole pockets. Energies are given relative to the Fermi energy. The LDA+DMFT calculations predict that the hole pockets at the P and 

A closer examination of the band shape in the vicinity of the L point reveals that __L points appear at about 101.5 GPa and 183 GPa, respectively. 

this hole pocket first appears along the L-H line and then extends to include 
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Extended Data Figure 6 | k-resolved spectral functions A(k, @) of Os at 
ultra-high compressions along the high-symmetry lines in the hcp 
Brillouin zone. a, b, Calculations are carried out at pressures of 247 GPa 

(a) and 477 GPa (b) using LDA+ DMFT. The experimental lattice parameters 
were used in the calculations (a = 2.449 A, cla = 1.596 at 247 GPa; a = 2.335 A, 
cla = 1.597 at 477 GPa). Energies are given relative to the Fermi energy, which 
is taken to be zero. Hole pockets are present at the I’ and L points at 

P= 247 GPa. Increasing the pressure to 477 GPa does not induce any new 
ETTs. Moreover, no features of the band structure suggest that new ETTs might 
be induced by further increase of the pressure within reasonable limits. 
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Extended Data Figure 7 | Electronic band structure of Os at moderate included, whereas in b and d, the energies were obtained with the scalar- 
compressions along the high-symmetry lines in the hcp Brillouin zone. relativistic approximation. Experimental lattice parameters were used in the 
a-d, Band structure of Os at 0 GPa (a, b) and at 134 GPa (c, d). Energies are calculations (see Extended Data Fig. 4). Bands are shown in different colours for 
given relative to the Fermi energy. In a and c, spin-orbit coupling has been clarity. 
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Extended Data Figure 8 | Electronic band structure of Os at extreme included, whereas in b and d, the energies were obtained with the scalar- 
compressions along the high-symmetry lines in the hcp Brillouin zone. relativistic approximation. Experimental lattice parameters were used in the 
a-d, Band structure of Os at 247 GPa (a, b) and at 477 GPa (c,d). Energies are __ calculations (see Extended Data Fig. 6). 

given relative to the Fermi energy. In a and c, spin-orbit coupling has been 
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Extended Data Figure 9 | Comparison of the EOS for hcp Os calculated 
using different approximations within DFT with the experimental EOS 
measured in this work. Shown are GGA calculations carried out by us using 
the RSPt method (black solid line) and by Sahu et al.*° (blue dashed line), as well 
as LDA calculations carried out by us using the Wien2k method (red solid line), 
by Cynn et al.* (pink dashed line), and by Sahu et al.*° (red dashed line). The 
experimental EOS obtained in this work is shown with black circles. The 
different curves agree reasonably well for pressures up to about 50 GPa. In 
contrast, there is noticeable discrepancy at ultra-high pressure, and there is no 
theoretical EOS that accurately describes the experiment for pressures around 
0.5 TPa. Local and semi-local approximations within DFT are insufficient to 
describe the P-V relationship of Os under extreme conditions. Part of the 
reason for the disagreement between theory and experiment might be related to 
the improper account of the many-electron effects within the theory. In 
principle, LDA and GGA work better at high pressure; however, errors that 
might be important at low pressure*’ could propagate through the EOS to the 
whole pressure range, owing to the use of all the calculated points in the fitting 
of the energy versus volume data by, for example, the third-order Birch- 
Murnaghan EOS used in this work. To justify this statement, consider the 
calculated EOS parameters, summarized in Extended Data Table 1. The 


equilibrium volumes calculated by us differ from the experiment (Extended 
Data Table 1) by less than 1.5%. On the other hand, the overestimation of the 
calculated bulk moduli (B) and their pressure derivatives (B’) is greater, about 
10%. Our EOS parameters are within the range of theoretical parameters 
available in the literature, which are fitted for P< 100 GPa (Extended Data 
Table 1). We deal with a highly incompressible metal, for which typical DFT 
errors in B, and especially in B’, translate into large differences in P-V 
relationships at ultra-high pressure. Even in this regime, the error in volume at a 
fixed pressure remains within typical DFT limits of about 2%-3%. However, the 
pressure calculated at fixed volume can differ by several tens of gigapascals. This 
difference is due to errors in B and B’ calculated at ambient pressure, coupled to 
a very high value of B. The use of more advanced theoretical methods could 
improve the calculated EOS. In ref. 47, a substantial reduction of B in 
isoelectronic hcp Fe is demonstrated using a LDA+DMFT approach. Here the 
effect is expected to be smaller, but may still be sufficient to improve the 
agreement with experiment. Our results demonstrate a need to further develop 
the electronic structure theory, with the experiment reported here providing a 
bench-mark for the theory. On the other hand, we consistently used 
experimental lattice parameters in the discussion of the electronic structure of 
Os in this study. 
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Extended Data Figure 10 | Comparison of the ratio of the lattice parameters — however, the calculations are carried out at T = 0 K, whereas the experimental 


c/a for hcp Os. The ratio is calculated using the RSPt method at T=0 K data are taken at room temperature. Therefore, a direct comparison between 
with a PBE-GGA (red dashed line), and using Wien2k with a LDA (greenline) theory and experiment is nontrivial (Methods). In hcp metals, the effect of 
and a PBE-GGA (blue line). The room-temperature experimental results the electronic transitions on the lattice-parameter ratio should become visible 
obtained in this work are shown with filled black dots. Agreement between the at finite temperatures, owing to the peculiarities of the thermal expansion 
calculated and experimental c/a ratio is typical for DFT calculations. The coefficients, which are anisotropic along different directions of the crystal 
theoretical results do not show any peculiarity of the lattice-parameter ratio; _lattice. 
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Extended Data Table 1 | EOS parameters of Au, Pt, W, and Os 


5 Metal 3™ order Birch-Murnaghan EOS Vinet EOS Reference; 
Vo, A’/unit cell Ko, GPa K’ Vo, A’/unit cell K3o0, GPa K’ Range 
Au 67.85 167.5 5.61 20:550GPa 
67.85 167 5.88 57:250 GPa 
67.85 167 6.0 58;100 GPa 
Pt 60.389(5) 274(2) 5.23(3) 60.381(5) 261(2) 5.83(3) This work 
480 GPa 
60.38 276.4 5.12 20:550GPa 
60.38 277 5.08 54:94 GPa 
60.38 277 5.43 57;:250 GPa 
WwW 31.674(3) 306.8(1.5)  4.53(4) 31.686(7) 301.9(1.2) 4.82(3) This work 
200 GPa 
31.724 296 4.30 54:94 GPa 
31.691 317 3.16 55:33 GPa 
31.72 308 4.25 23:400 GPa 
Os 28.02(4) 398.5(5.9)  4.04(4) 28.08(5) 380(7) 4.48(6) This work 
517 GPa 
27.96 462 2.4 4:65 GPa 
27.941 411 4.0 3:75 GPa 
27.977 395 4.5 2:58 GPa 
b VIA B [GPal B’ 
GGA (RSPt, this work) 28.00 448 4.29 
GGA (RSPt, this work, fitted at P<100 GPa) 28.00 423 4.67 
GGA (Ref. 56) 28.72 401 5.47 
GGA (Ref. 3) 382 4.60 
LDA (Wien2k, this work) 27.66 447 4.41 
LDA (Ref. 3) 437 4.46 
LDA (Ref. 56) 27.60 454 5.26 
LDA (Ref. 4) 27.50 445 4.4 


a, Experimental data; b, theoretical data. Values in a are from this work and refs 2-4, 20, 23, 54, 55, 57, and 58. Presented in b are GGA calculations carried out by us using the RSPt method and the calculations 
presented in ref. 56, as well as LDA calculations carried out by us using the Wien2k method and the calculations presented in refs 3, 4, and 56. 
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protein-DNA nanowires 
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Biomolecular self-assemblies are of great interest to nanotech- 
nologists because of their functional versatility and their 
biocompatibility’. Over the past decade, sophisticated single- 
component nanostructures composed exclusively of nucleic 
acids” °, peptides** and proteins” ** have been reported, and these 
nanostructures have been used in a wide range of applications, 
from drug delivery'® to molecular computing”’. Despite these suc- 
cesses, the development of hybrid co-assemblies of nucleic acids 
and proteins has remained elusive. Here we use computational 
protein design to create a protein-DNA co-assembling nanomater- 
ial whose assembly is driven via non-covalent interactions. To 
achieve this, a homodimerization interface is engineered onto the 
Drosophila Engrailed homeodomain (ENH), allowing the dimerized 
protein complex to bind to two double-stranded DNA (dsDNA) 
molecules. By varying the arrangement of protein-binding sites on 
the dsDNA, an irregular bulk nanoparticle or a nanowire with sin- 
gle-molecule width can be spontaneously formed by mixing the 
protein and dsDNA building blocks. We characterize the protein- 
DNA nanowire using fluorescence microscopy, atomic force micro- 
scopy and X-ray crystallography, confirming that the nanowire is 
formed via the proposed mechanism. This work lays the foundation 
for the development of new classes of protein-DNA hybrid materi- 
als. Further applications can be explored by incorporating DNA 
origami, DNA aptamers and/or peptide epitopes into the protein- 
DNA framework presented here. 

The functionality of single-component biomolecular materials is 
ultimately limited by the physical and/or chemical nature of their 
constituent parts. Advanced materials that integrate a variety of chem- 
ical building blocks will facilitate the incorporation of greater function- 
ality in materials design'*’’. Hybrid assemblies that integrate both 
protein and DNA have been engineered for several useful applications, 
including catalytic cascades”, amplified biosensors” and functional 
templates for the growth of inorganic materials’. These materials, how- 
ever, rely on DNA self-assembly to form a scaffold to which proteins are 
attached via chemical conjugation methods*®”—an approach that is 
problematic for two reasons. First, chemical conjugation as a means to 
molecular assembly has several drawbacks, including a complicated 
synthesis process, heterogeneous labelling problems and inapplicability 
to in vivo systems. Conjugation problems have been mitigated some- 
what through the use of RNA aptamers, which allow the programmed 
positioning of proteins in vivo’. Nevertheless, existing protein- 
nucleic-acid materials still suffer from a second shortcoming in that 
the main body of the nanostructure is formed solely from nucleic acids. 
A co-assembling scaffold that integrally incorporates both DNA and 
protein would allow greater structural diversity and permit easy tem- 
poral and locational control over the self-assembly. However, the con- 
struction of such a hybrid has thus far remained elusive. 

Here we create co-assembling protein-DNA nanomaterials via 
non-covalent interactions. We start by engineering a dual-function 
protein that contains both a homodimerization domain and a 
DNA-binding domain. This designed protein homodimer serves as a 


primary building block that binds two DNA molecules; each mono- 
meric subunit binds a specific fragment of dsDNA. We chose ENH as 
our scaffold protein for the following reasons: (1) a single-helical pep- 
tide from ENH (helix 3) binds a target dsDNA motif (TAATNN)™ as 
tightly as the full-length protein (dissociation constant (Kg) is in the 
nanomolar range)”*, indicating that DNA-binding functionality could 
be isolated to a specific domain; (2) ENH has been intensely studied 
using computational tools, and highly stable variants”®, full-sequence 
redesigns’ and a de novo homodimer design”* have been generated; 
and (3) as a three-helix protein, ENH provides a surface for homo- 
dimer interface design (the exterior faces of helices 1 and 2) that is 
structurally opposite its DNA-binding helix 3. 

Figure 1 illustrates our protein-DNA nanomaterial design strategy. 
Using the Drosophila melanogaster ENH crystal structure (Protein 
Data Bank (PDB) accession 1ENH) as our docking subunit (Fig. 1a), 
we performed fast Fourier transform-based docking to generate C2 
symmetrical homodimer models. The best model exhibited parallel 
intermolecular helical packing between helices 1 and 2 of each of the 
ENH monomers. Computational protein design was then used to 
design the interface residues of the docked model to minimize the free 
energy of the intermolecular side-chain interactions (Fig. 1b). Early 
design variants were characterized and iteratively improved with the 
use of a molecular dynamics screening protocol”. We named the final 
designed variant dualENH because it has dual functionality: it can 
both homodimerize and bind dsDNA. A dualENH homodimer serves 
as the protein building block for nanomaterial assembly because it 
has two binding sites for dsDNA (one on each of two opposite 
faces of the homodimer), as shown in an aligned model (Fig. Ic). 
The second designed component of the nanomaterials is a dsDNA 
building block with protein-binding sites variously placed along the 
double helix (Fig. 1d and Extended Data Fig. 1a). By tuning the posi- 
tioning of binding sites on the dsDNA and then simply mixing the two 
designed components (DNA and protein) together, we were able to 
achieve co-assembly of both irregularly shaped particles of protein and 
DNA (Extended Data Fig. 1b) and well-ordered protein-DNA nano- 
wires (Fig. le). 

We characterized dualENH to confirm that: (1) it forms a homo- 
dimer with helical secondary structure and no high-order aggregation; 
(2) it binds dsDNA probes specifically; and (3) each homodimer binds 
two dsDNA molecules (Supplementary Information and Extended 
Data Figs 2-4). We next sought to observe the protein-DNA self- 
assembly using fluorescence microscopy. Figure 2a shows that nano- 
particles were formed immediately after 5 4M dualENH was mixed 
with 24M (TAA);. The particles were irregularly shaped with dia- 
meters of up to several micrometres (see Extended Data Fig. 5a for 
the size distribution). The irregularity of shape was expected because 
(TAA); has four ENH-binding sites (TAATAA) that each face in a 
different direction off of the dsDNA helix, which causes particle 
growth to occur in a random branching pattern (Extended Data 
Fig. 1b). The particles are invisible under bright-field microscopy 
(Extended Data Fig. 5b) because of the transparency of protein and 
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Figure 1 | Protein-DNA nanomaterial design strategy. a, Helix 1 and helix 2 
(green) of ENH were engineered into a homodimerization domain, and 

helix 3 (blue) is the native DNA-binding domain. b, The interface of the docked 
model was designed for homodimerization. c, The designed homodimer, 
named dualENH, binds two dsDNA fragments on its outward faces. This 
model was generated by aligning the homodimer model in b with the 


DNA to visible light. A control experiment showed that a solution of 
(TAA), by itself did not form any particles (Extended Data Fig. 5c). 
When lower concentrations of dualIENH (500nM) and (TAA), 
(200 nM) were used, a smaller and more uniform particle distribution 
was observed (Extended Data Fig. 5d). Further decreasing the protein 
concentration (<200nM) greatly reduced the number of particles 
formed (Extended Data Fig. 5e). This reduction may be due to dissoci- 
ation of the homodimer at low concentrations. To confirm that nano- 
particle formation occurs via the proposed mechanism (Fig. le), we 
designed a particle inhibition experiment using dsDNA with a single 
binding site as the inhibitor. The single binding site on these ds DNAs 
should terminate particle growth. Pre-incubation of 500 nM dualENH 
with only trace amounts of single-binding-site dsDNA (5 nM) abol- 
ished particle formation completely when 200 nM (TAA), was added 
(Extended Data Fig. 5f). 

To form a linear protein-DNA co-assembly, as illustrated in Fig. le, 
the dsDNA building block must have two protein-binding sites about 
180° apart on the dsDNA double helix (Fig. 1d). We designed a 
25-nucleotide dsDNA molecule with an 11-nucleotide binding motif 


Figure 2 | Fluorescence microscopy of protein-DNA nano-objects. 

a, dsDNA (TAA); fragments were labelled with the fluorescent dye Cy3. A 
fluorescent image was taken of particles formed by mixing 5 uM dualENH with 
2 uM Cy3-(TAA); in 20 mM Tris-HCl and 100 mM NaCl at pH 8.0. The 
particles formed irregular shapes up to ~5 jum in diameter. b, Same experiment 
as in a except that 25-nucleotide dsDNA fragments containing motif 11 
(TAATTTAATTT) in the middle (CGCAGTGTAATTTAATTTCCTCGAC; 
highlighted in bold) were used instead of (TAA), fragments. All particle 

sizes are under the diffraction limit (submicrometre). The shapes of the 
particles are slightly oval instead of being symmetrical (circular) due 

to moderate geometrical aberrations of the microscopy system. 


ENH-DNA co-crystal structure (PDB accession 3HDD). d, Two protein- 
binding sites were engineered onto a dsDNA fragment so that two dualENH 
dimers would bind 180° apart along the double helix. e, The dualENH protein 
in c and the dsDNA fragment in d co-assemble into a protein-DNA 
nanowire. Note that this two-dimensional cartoon is for purposes of illustration 
only, and that the three-dimensional design model of the nanowire is spiralled. 


(TAATTTAATTT, named motif 11) that contains two ENH-binding 
motifs (TAATTT) facing in opposite directions off of the helix. At the 
same protein and DNA concentrations used in the earlier particle- 
forming experiment (5 1M and 2 iM, respectively) (Fig. 2a), dsDNA 
containing motif 11 and dualENH formed much smaller and more 
uniform particles, with none growing greater in size than the diffrac- 
tion limit (submicrometre) (Fig. 2b). The reduced particle size may be 
a result of fewer protein-binding sites on the DNA building block (two 
versus four). Fewer protein-binding sites would decrease the entropy 
of self-assembly and increase the chance of binding-site poisoning. We 
used atomic force microscopy (AFM) to study the topology of 
the protein-DNA co-assembly formed with dualENH and the two- 
binding-site ds DNA. Nanowire structures were clearly observed with a 
width of ~15nm and a length of up to ~300 nm (Fig. 3a, b), which 
corresponds to ~60 repeated units of protein and dsDNA on the basis 
of the design model. In accordance with the design model, the 
observed width of the nanowires (~15nm) is consistent with the 
length of the dsDNA (~9nm), considering that AFM usually over- 
estimates the length in the x-y plane due to the size of the tip. The 
height of the nanowire is ~ 1.0 nm (Fig. 3c), which is on the order of the 
diameter of a dsDNA fragment (~2 nm); the decreased height could 
be due to compression by the hard AFM tip (k = 3N m_'). Note that 
dsDNA molecules lie flat on a Mg”*-mica substrate due to strong 
electrostatic interactions between dsDNA and Mg?* ions, and thus a 
flat ribbon, instead of the spiralled structure predicted in the design 
model, was observed in the AFM measurements. 

We solved the co-crystal structure of dualENH with a dsDNA probe 
containing motif 11 (Fig. 4). These structural data confirmed both the 
spiralled nature of the nanowire and the dual functionality of 
dualENH: (1) dualENH uses helix 3 to bind dsDNA just as wild-type 
ENH does; and (2) dualENH uses the surfaces of helix 1 and helix 2 to 
form a homodimer (Fig. 4a). However, the co-crystal structure reveals 
two homodimer configurations of dualENH that differ slightly from 
each other, as seen by their backbone root mean squared deviation 
(r.m.s.d.) of 4.0 A (Fig. 4b). This unexpected result might be caused by 
crystal packing forces, especially as each dsDNA molecule in the crystal 
forms a superhelix (see end-to-end packing of dsDNA fragments in 
Extended Data Fig. 6a). Given this observation, we cannot conclude 
that either of the observed dualENH dimers in the crystal structure 
reflects the predominant dimer structure in solution. The two 
dualENH dimer crystal structures have backbone r.m.s.d. values to 
the design model of 3.8 A and 3.9 A, respectively (see Extended Data 
Fig. 6b, c). The co-crystal structure confirms that dualENH binds the 
dsDNA with its designed 11-nucleotide binding motif; however, it 
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Figure 3 | Atomic force microscopy of protein-DNA nanowires. a, Repre- 
sentative AFM image obtained after mixing 5 1M dualENH with 2 uM of the 
two-binding-site dsDNA (25-nucleotide dsDNA containing motif 11, as in 

Fig. 2b). Nanowire structures ~15 nm wide and up to 300 nm long are clearly 


also reveals two configurations of protein-DNA binding. One con- 
figuration is consistent with our design model, in which two dualENH 
molecules bind to the 11-nucleotide motif exactly opposite each other 
on the dsDNA double helix (Fig. 4c). The other configuration, how- 
ever, features one of the dualENH bindings in an inverted orientation; 
that is, it binds to the reverse complementary sequence (AAATTA) of 
the optimal binding motif (TAATTT) (Fig. 4d). This suboptimal 
binding has also been seen in other ENH crystal structures, presum- 
ably due to the high concentrations of protein and DNA used for 
crystallization. Because of this alternative protein-DNA-binding con- 
figuration, the nanowire in the co-crystal structure is slightly kinked; 
nevertheless, an infinitely repeated protein-DNA nanowire is 
observed (Fig. 4e). Future work should be focused on gaining greater 
control over geometric specificity, especially in efforts to engineer 
more sophisticated three-dimensional structures. 

We used computational protein design to design a protein-DNA 
nanomaterial, whose co-assembly is purely driven by non-covalent 
interactions. Unlike assemblies that rely on chemical conjugations, 


Onm 


visible. b, Magnified image of a single nanowire ~250 nm in length. c, Three- 
dimensional topology display of b shows that the height of the nanowire is 
~1.0nm. 


non-covalent self-assemblies can be tuned by altering the reaction 
conditions (for example, temperature, pH, salt concentration). 
Indeed, our protein-DNA nanostructures will not form in high con- 
centrations of salt because the protein-DNA electrostatic interaction is 
shielded (see Extended Data Fig. 7). Recently, a co-assembling system 
for protein nanomaterials was developed in which two different pro- 
teins are required for assembly’*. Co-assembling systems provide sev- 
eral advantages over single-component self-assemblies, including 
better control over the localization and timing of assembly, and greater 
functional and structural versatility of the assembly, as each compon- 
ent can confer unique attributes, especially in the case of hybrid 
materials such as those designed here. The protein-DNA nano- 
structures we designed could be further functionalized with the 
incorporation of engineered DNA structures, such as DNA origami 
or DNA aptamers. Furthermore, dualENH could be fused to peptide 
tags for antibody recognition or used for the specific attachment of 
organic and inorganic materials”. Finally, peptide self-assembly tech- 
niques’ could be incorporated into the protein-DNA framework pre- 


Figure 4 | Co-crystal structure of protein-DNA complex. a, dualENH forms 
a symmetric homodimer using helix 1 and helix 2 (green) as the protein- 
protein interface. Helix 3 (blue) binds to the dsDNA in the same way that wild- 
type ENH does. b, Two forms of dualENH are present in the co-crystal 
structure and occur in a molar ratio of 3:1 (green:cyan). c, d, Two forms of 
protein-DNA binding are observed in the co-crystal structure and occur in a 
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molar ratio of 1:1. Both have two dualENH homodimers bound on the designed 
11-nucleotide motif TAATTTAATTT. In ¢, both of the dualENH dimers 
bind in the optimal motif (TAATTT) orientation, whereas in d, one of the 
dualENH dimers (right) binds in the suboptimal orientation (AAATTA, the 
reverse complementary sequence of the optimal motif). e, Slightly kinked 
nanowire structure found in the co-crystal structure. 
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METHODS 


Protein docking and computational design. The ENH crystal structure (PDB 
accession 1ENH)*! was used as the scaffold for homodimerization. A symmetrical 
docking program based on a fast Fourier transform (FFT) algorithm was applied”. 
Using shape complementarity as the criterion, the top 200 docking models were 
selected and clustered into 11 groups based on structural similarity (r.m.s.d. 
values). These clusters were visually inspected and one model was chosen for 
homodimer design. In this homodimer model, the two helix 1s (one from each 
subunit) form a parallel helix-helix packing assembly and are separated by 10 A, 
which is similar to the 9.8 A separation found in naturally occurring coiled-coil 
dimers. Homebuilt computational protein design software was used for sym- 
metry-constrained homodimer designs. The following types of amino acids were 
allowed at the interfacial residues: A, D, E, F, H, I, K, L, N, Q, R, S, T, V, W and Y. 
The E28P mutation was identified by the molecular dynamics (MD) simulations as 
described elsewhere”. The Rosetta force field was used to rank sequence energies”. 
Sequence optimization was performed using an improved version of FASTER® 
using a rotamer library based on the backbone-dependent library described 
previously’®. The sequences of dualENH and wild-type ENH are listed in 
Extended Data Table 1. 

Construct preparation, protein expression, and purification. Oligonucleotides 
(Integrated DNA Technologies) containing ~20bp overlapping segments 
were assembled via a modified Stemmer PCR method” using KOD Hot Start 
Polymerase (Novagen) to generate genes for wild-type ENH and dualENH. 
A His, tag and a Gly-Ser linker (GGSGG) were added to the carboxy terminus. 
Proteins were expressed using BL21 DE3 cells transformed by pET plasmids with 
1 mM isopropyl B-p-1-thiogalactopyanoside (IPTG) in standard Luria Broth (LB) 
at 37°C. Proteins were purified from supernatant of lysed cells using affinity 
chromatography (Ni?*-NTA, Qiagen) followed by size-exclusion chromato- 
graphy (Superdex 75, Amersham Pharmacia). Expression of dualENH at 37 °C 
produced >10 mg of soluble protein per litre of Escherichia coli culture. 
Circular dichroism spectroscopy. Circular dichroism (CD) studies were per- 
formed on an Aviv 62A DS spectropolarimeter equipped with a thermoelectric 
temperature controller. Samples were prepared in 100 mM sodium chloride and 
20 mM sodium phosphate buffer at pH 7.5. Wavelength scans and temperature 
denaturations were carried out in cuvettes with a 0.1 cm path length at a protein 
concentration of ~10 1M. Three wavelength scans were performed at 25 °C for 
each sample and averaged. The thermal denaturation curve was collected at 222 nm 
from 0°C to 99°C, sampling every 1°C separated by 2 min equilibration times 
(signal averaging time was 1 s). The refolding curve was collected after the thermal 
denaturation experiment using the same sample. 

Analytical ultracentrifugation. dualENH was analysed on an XL-1 analytical 
ultracentrifuge equipped with an AnTi60 rotor (Beckman Coulter). Two-channel 
epon-filled centerpieces were used for the sedimentation velocity experiment. 
Cells were torqued to 130 lb inch and run at 60,000 r.p.m. Data were acquired 
at 280 nm at 20°C in continuous mode. Data were first fit to the c(S) model 
(continuous distribution of sedimentation coefficient) and then converted to the 
c(M) model (continuous distribution of molecular weight). Time invariant noises 
and baseline offsets were corrected before fitting. A maximum entropy regulariza- 
tion confidence level of 0.95 was used in all the size distribution analyses. 
Polarization fluorescence assay. Polarization fluorescence was measured at room 
temperature with a Synergy 2 (BioTek). All DNA oligonucleotides were purchased 
from Integrated DNA Technologies without further purification. The three probes 
have the following sequences: CGCAGTGTAATTACCTCGAC (probe 1; bold 
indicates the ENH protein-binding site), CGCAGTGTACTTACCTCGAC 
(probe 2; bold indicates the ENH protein-binding site with a single mutation), 
and CAGGCAGCAGGTGTTGGACT (negative control). The 3’ terminus of each 
probe was labelled with fluorescein. The dsDNA samples were prepared by mixing 
equimolar single-stranded DNA with its complementary sequence. The mixture 
was heated to 95°C for 10 min and gradually cooled down to room temperature. 
dualENH was serially diluted in buffer containing 20 mM Tris-HCl and 100mM 
NaCl at pH 8.0, except for the NaCl-dependent experiments in Extended Data 
Fig. 7a. Concentrations of all probes were kept at 25 nM. The total volume of each 
sample was kept at 200 1l. The measurements were taken after about a 10 min 
equilibration. The G factor was calibrated and kept at 0.87 for all samples. 
Fluorescence resonance energy transfer assay. The fluorescence resonance 
energy transfer (FRET) emission spectrum was measured at room temperature 
with a Safire2 (Tecan) plate reader. The (TAA); oligonucleotide was labelled at its 
5’ terminus with either Cy3 or Cy5. Preparations of dsDNA samples were made as 
described earlier. Samples for the FRET experiment were prepared by mixing 


400 nM Cy3-(TAA)s and 600 nM Cy5-(TAA)s for a reference, and then by adding 
4M dualENH to observe a FRET signal change. The buffer contained 20 mM 
Tris-HCl and 100 mM NaCl at pH 8.0. 

Microscope imaging. All imaging was performed at room temperature on a 
standard epifluorescence microscope (IX71, Olympus) equipped with bright-field 
and fluorescence modalities. The imaging objective was a X40 NA 0.75 objective 
lens (UPLFLN 40X, Olympus). The (TAA); oligonucleotides were labelled with 
Cy3 at their 5’ terminus. The 11-nucleotide motif oligonucleotides (CGCAGTG 
TAATTTAATTTCCTCGAC; bold indicates motif 11) were labelled with fluor- 
escein at their 5’ terminus. The dsDNA samples were prepared as described earlier. 
All experiments were done in the following buffer: 20mM Tris-HCl, 100 mM 
NaCl at pH 8.0, except that the NaCl concentration was increased to 150 mM in 
Extended Data Fig. 7b. 

Particle size distribution. We used quantitative fluorescence microscopy to 
determine the particle size distribution of irregularly shaped protein-DNA part- 
icles. We assumed that, for a particle formation process with no specific dimen- 
sional preferences, the particle volume is approximately proportional to the 3/2-th 
power of the particle area, which can be directly estimated from the fluorescence 
image. We plotted particle brightness against particle volume for particles with 
roundness (minor axis/major axis) >0.7, which represents ~35% of the popu- 
lation and spans all particle sizes, and applied a linear regression to determine the 
relationship between the particle brightness and volume for each image (averaged 
R? = 0.93). Using this brightness versus volume relationship, we calculated the 
volume of every particle, including those smaller than the imaging resolution and 
those with low roundness, to obtain an overall particle size distribution. 

Atomic force microscopy. Samples were deposited on a mica surface in a buffer 
containing 100mM NaCl, 4mM MgCl, and 20mM TrisHCl at pH 8.0. After a 
2 min incubation, the mica surface was washed with 3 ml pure water and air-dried. 
AEM images were taken using repulsive AC mode on an Asylum MFP-3D-bio 
imager, with an AFM tip spring constant of 3N m_'. The scanning rate was 1 Hz. 
X-ray crystallography. The dsDNA used for crystallization had forward and 
backward sequences as follows: GTGTAATTTAATTTCC and CGGAAATTA 
AATTACA, respectively. An equimolar mixture of the forward and backward 
oligomers was heated to 95°C for 10 min and gradually cooled down to room 
temperature. The dualENH (4.9 mM in 1.6 M sodium chloride and 20 mM MES 
buffer at pH 5.8) and the dsDNA (5.6 mM in 10mM TrisHCl at pH 8.0) were 
mixed in equal volumes. Protein-DNA co-crystals were grown at room temper- 
ature in 0.2 M potassium thiocyanate and 20% w/v polyethylene glycerol 3350 at 
pH 7.0 using hanging-drop diffusion. Diamond-like crystals appeared within 
1-2 days. The crystals were soaked in 25% ethylene glycol cryoprotectant and 
flash frozen by liquid nitrogen. Diffraction data were collected with a wavelength 
of 1.03 A at beamline 12-2 at Stanford Synchrotron Radiation Lightsource. The 
best diffraction data had a resolution of ~3.1 A. Phases were obtained through 
molecular replacement using wild-type ENH-DNA co-crystal structure (PDB 
accession 3HDD)”* as the searching model. Further refinement was done with 
PHENIX” and Coot”. The final refined coordinate has 97.7% and 2.3% backbone 
dihedral angles in the most favoured and additional allowed regions, respectively. 
The data statistics are listed in Extended Data Table 2. 


31. Clarke, N. D., Kissinger, C. R., Desjarlais, J., Gilliland, G. L. & Pabo, C. O. Structural 

studies of the engrailed homeodomain. Protein Sci. 3, 1779-1787 (1994). 

32. Huang, P.-S., Love, J. J. & Mayo, S. L. Adaptation of a fast Fourier transform-based 

docking algorithm for protein design. J. Comput. Chem. 26, 1222-1232 (2005). 
33. O’Shea, E. K., Klemm, J. D., Kim, P.S. & Alber, T. X-ray structure of the GCN4 leucine 
zipper, a two-stranded, parallel coiled coil. Science 254, 539-544 (1991). 

34. Das, R. & Baker, D. Macromolecular modeling with Rosetta. Annu. Rev. Biochem. 
77, 363-382 (2008). 

35. Allen, B. D. & Mayo, S. L. Dramatic performance enhancements for the FASTER 
optimization algorithm. J. Comput Chem. 27, 1071-1075 (2006). 

Dunbrack, R. L. Jr & Karplus, M. Backbone-dependent rotamer library for proteins. 
Application to side-chain prediction. J. Mol. Biol. 230, 543-574 (1993). 
Stemmer, W. P., Crameri, A., Ha, K. D., Brennan, T. M. & Heyneker, H. L. Single-step 
assembly of a gene and entire plasmid from large numbers of 
oligodeoxyribonucleotides. Gene 164, 49-53 (1995). 

38. Fraenkel, E., Rould, M. A., Chambers, K. A. & Pabo, C. O. Engrailed homeodomain- 

DNA complex at 2.2 A resolution: a detailed view of the interface and comparison 

with other engrailed structures. J. Mol. Biol. 284, 351-361 (1998). 

39. Adams, P. D. et al. PHENIX: a comprehensive Python-based system for 

macromolecular structure solution. Acta Crystallogr. D 66, 213-221 (2010). 

40. Emsley, P. & Cowtan, K. Coot: model-building tools for molecular graphics. Acta 

Crystallogr. D 60, 2126-2132 (2004). 


36. 


37. 


©2015 Macmillan Publishers Limited. All rights reserved 


LETTER 


Extended Data Figure 1 | Design model of irregular bulk protein-DNA not be simultaneously occupied due to steric hindrance. b, Cartoon illustrating 
nanoparticle. a, Four consecutive ENH-binding sites that each face in a an irregular shaped nanoparticle formed by co-assembly of dualENH and 
different direction are engineered onto a dsDNA fragment. This dsDNA the dsDNA shown in a. The DNA-binding domains of dualENH are shown as 


building block allows the protein-DNA assembly to occur in all three dimen- _ blue triangles, and the homodimerization domains are shown as green squares. 
sions. Note that in this particular design, two neighbouring binding sites may 
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before thermal denaturation; dashed line: after thermal denaturation. The dualENH was determined to be 59 °C. 


overlapping of the two curves indicates that dualENH folds reversibly. 
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Extended Data Figure 3 | Biophysical characterization of dualENH. a, Size- 
exclusion chromatography of dualENH with three different loading 
concentrations: solid line, 650 1M; dashed line, 80 1M; dotted line, 5 uM. The 
highest signals were normalized to 100 for all curves. b, c(S) model fit from a 
sedimentation velocity experiment of 40 1M dualENH. The major peak 
around S = 1.9 corresponds to a molecular weight of 18.3 kDa, which is about 
twice that of monomeric dualENH (8.7 kDa). The spike at the left (S < 0.5) may 
be due to impurities or artefacts from model fitting. c, Raw data and fitting 
residuals for the sedimentation velocity experiment in b. A total of 378 curves 
were used for fitting, but for visual clarity only one-fifth of the curves are shown. 
The top graph shows the raw data (dots) and the fitting curves; the bottom 
shows the residuals between the experimental data and the fit. The square root 
of variance of the fit is 0.00669. d, Fluorescence polarization experiment. Two 
dsDNA sequences labelled with fluorescein were used as probes to assay 


dualENH-DNA binding. Probe 1: 20-nucleotide dsDNA with the binding 
motif TAATTA; probe 2: same sequence as probe 1 but with a single-nucleotide 
mutation to the binding motif (TA[C]TTA). The concentration of dualENH 
was varied, while the concentration of the three probes remained constant 
(25 nM). Data are shown as mean + standard error of the mean (s.e.m.) for 
three replicates. e, FRET experiment showing that dualENH brings two dsDNA 
fragments within Forster distance. Fifteen-nucleotide dsDNA (TAA); were 
labelled with either Cy3 or Cy5 to serve as the FRET donor or acceptor. Grey 
line: 400 nM Cy3-(TAA)s plus 600 nM Cy5-(TAA)s; black line: 400 nM Cy3- 
(TAA); plus 600 nM Cy5-(TAA); plus 4 uM dualENH. f, Two control 
experiments for the FRET experiment in e. Black line: 400 nM Cy3-(TAA)s; 
black dashed line: 400 nM Cy3-(TAA); plus 4 1M dualENH; grey line: 600 nM 
Cy5-(TAA)s; grey dashed line: 600 nM Cy5-(TAA); plus 4 uM dualENH. 
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wild-type ENH show saturated binding when the protein concentration isator | shown as mean + s.e.m. for three replicates. 
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Extended Data Figure 5 | Microscope imaging experiments. a, The size 
distribution of the irregular protein-DNA particles formed by 5 1M dualENH 
mixed with 2 1M Cy3-(TAA)s. b, Bright-field microscopy image of 5 1M 
dualENH mixed with 2 uM Cy3-(TAA)s. A dust particle (top left) is evident, 
indicating that the focal plane is correct. c, Fluorescence microscopy image 
of 2 uM Cy3-(TAA); alone. d, Fluorescence microscopy image of particles 


formed with 500 nM dualENH mixed with 200 nM Cy3-(TAA)s. 
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e, Fluorescence microscopy image of particles formed with 200 nM dualENH 
mixed with 100nM Cy3-(TAA)s. f, Fluorescence microscopy image of 
particle inhibition experiments. A small amount (5 nM) of single-binding-site 
dsDNA (containing only one TAATTA motif) was pre-mixed with 500 nM 
dualENH, then 200 nM Cy3-(TAA)s was added. The illumination/camera 
sensitivity was enhanced to confirm that particle formation is nearly completely 
absent under these conditions. 
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Extended Data Figure 6 | Co-crystal structure of the protein-DNA 
complex. a, Structures in the asymmetric unit cell are shown in colour, and the 
end-to-end packing of neighbouring DNA molecules and their bound 
proteins are shown in grey. b, c, The dualENH homodimer observed in the 
co-crystal structure (green or cyan) is superimposed with the design model 
(grey). The backbone r.m.s.d. compared to the design model is 3.8 A (green) 
and 3.9 A (cyan), respectively. When only one subunit is aligned between the 


more predominant configuration (green) and the design model, the angular 
displacement between the other subunits is ~45° with about 3 A translational 
displacement. The less predominant configuration has a lower angular 
displacement, ~20°, but a larger translational displacement, ~8 A. The 
calculated energies for the design model and the two crystallographic dimers 
are — 155.2, —140.3 (green) and —131.2 (cyan) Rosetta energy units, 
respectively. 
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Extended Data Figure 7 | dualENH-DNA binding and nanostructure 
formation are inhibited at high salt concentrations. a, Fluorescence 
polarization experiments of dualENH and probe 1 at various NaCl concen- 
trations. Probe 1 and dualENH were mixed in buffers with different NaCl 
concentrations and fluorescence polarization values were recorded. dualENH- 
DNA binding dropped greatly from 100 mM to 150mM NaCl, and was 
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completely absent at 300 mM NaCl. Data are shown as mean + s.e.m. for three 
replicates. b, Fluorescence microscopy image of particle experiment at 150 mM 
salt concentration. The sample was prepared by mixing 500 nM dualENH and 
200 nM Cy3-(TAA); in 20 mM Tris-HCl buffer with 150 mM NaCl. The 
illumination/camera sensitivity was enhanced to confirm that particle 
formation is nearly completely absent under these conditions. 
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Extended Data Table 1 | Sequences of wild-type ENH and dualENH 


ENH (WT) EKRPRTAFSSEQLARLKREFNENRYLTERRRQQLSSELGLNEAQIKIWFQNKRAKIKKST 


dualENH =-------- E--KKA-DLA-YFD----PEW-RY--QR----------------------- 


The three ‘coils’ at the top show the location of the three helices in the wild-type (WT) fold based on PDB 
structure 1ENH. The homodimerization domain is located on the first and second helices, and the DNA- 
binding domain is located on the third helix. 
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Extended Data Table 2 | Co-crystal structure statistics for dualENH complexed with dsDNA containing motif 11 (PDB access‘ ion 4QTR) 
4QTR 
Data collection 
Space group P4222 
Cell dimensions 
a, b, c (A) 90.1, 90.1, 158.9 
a, B, ¥ (°) 90.0, 90.0, 90.0 
Resolution (A) 39-3.1 (3.2-3.1)* 
Rymerge 0.043 (1.456) 
T/oI 28.4 (2.7) 
Completeness (%) 99.8 (99.8) 
Redundancy 2.7 (33) 
Refinement 
Resolution (A) 36-3.2 
No. reflections 14,287 
| ae Rise 26/ 32 
No. atoms 
Protein/DNA 2903 
Ligand/ion 0 
Water 0 
B-factors 
Protein/DNA 174.5 
Ligand/ion n/a 
Water n/a 
R.m.s deviations 
Bond lengths (A) 0.012 


Bond angles (°) 1.32 
One crystal was used for this coordinate. 
*Highest-resolution shell is shown in parentheses. 
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The amount of ice present in clouds can affect cloud lifetime, 
precipitation and radiative properties’’. The formation of ice in 
clouds is facilitated by the presence of airborne ice-nucleating 
particles’”. Sea spray is one of the major global sources of atmo- 
spheric particles, but it is unclear to what extent these particles are 
capable of nucleating ice*"'. Sea-spray aerosol contains large 
amounts of organic material that is ejected into the atmosphere 
during bubble bursting at the organically enriched sea-air inter- 
face or sea surface microlayer’*"’. Here we show that organic 
material in the sea surface microlayer nucleates ice under condi- 
tions relevant for mixed-phase cloud and high-altitude ice cloud 
formation. The ice-nucleating material is probably biogenic and 
less than approximately 0.2 micrometres in size. We find that 
exudates separated from cells of the marine diatom Thalassiosira 
pseudonana nucleate ice, and propose that organic material assoc- 
iated with phytoplankton cell exudates is a likely candidate for the 
observed ice-nucleating ability of the microlayer samples. Global 
model simulations of marine organic aerosol, in combination with 
our measurements, suggest that marine organic material may be an 
important source of ice-nucleating particles in remote marine 
environments such as the Southern Ocean, North Pacific Ocean 
and North Atlantic Ocean. 

Atmospheric ice-nucleating particles (INPs) allow ice to nucleate 
heterogeneously at higher temperatures or lower relative humidity 
than is typical for homogeneous ice nucleation. Heterogeneous ice 
nucleation proceeds via different pathways depending on temperature 
and humidity. In low-altitude mixed-phase clouds, INPs are com- 
monly immersed in supercooled liquid droplets and freezing can occur 
on them at temperatures between —36 °C and 0 °C (ref. 2). At higher 
altitudes and lower temperatures (less than —36°C, the conditions 
under which cirrus clouds form), nucleation occurs below water sat- 
uration, proceeding by homogeneous, deposition or immersion- 
in-solution nucleation’. Understanding the sources of atmospheric 
INPs is important because they affect cloud lifetime, cloud albedo and 
precipitation’*. Recent modelling work has shown that the ocean is 
potentially an important source of biogenic atmospheric INPs, particu- 
larly in remote, high-latitude regions”’®. However, it has never been 
directly shown that there is a source of atmospheric INPs associated 
with organic material found in marine waters or sea-spray aerosol. 

Organic material makes up a substantial fraction of submicrometre 
sea-spray aerosol and it is estimated that 10+5Tgyr ' of primary 


organic submicrometre aerosol is emitted from marine sources 
globally'*. Rising bubbles scavenge surface active organic material 
from the water column at their interfaces and this process facilitates 
the formation of the organic enriched sea-air interface known as the 
sea surface microlayer (SML). This organic material is ejected into 
the atmosphere during bubble bursting, resulting in sea-spray aerosol 
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Figure 1 | Sea-spray aerosol particles enriched in organic material are 
generated when bubbles burst at the air-sea interface. Surface active organic 
material of biological origin is scavenged at the interfaces of bubbles as they rise 
through the water column. This process enriches the air-sea interface with 
surface active organic material forming the SML (green layers). The organic 
material is ejected on bubble bursting with resulting submicrometre film drops 
being enriched with organic material compared with larger jet drops. We 
show that the biogenic organic material in the SML is probably an important 
source of atmospheric INPs that could influence cloud properties. 
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containing similar organic material to that of the microlayer'*"”” 


(Fig. 1). Measurements of atmospheric INP concentrations in the 
remote oceans indicate that there may be a marine source of INPs 
linked to marine biology**, and modelling work indicates that this 
may be important in mixed-phase clouds”"®. There is also evidence to 
suggest that ice nucleation in cirrus clouds over ocean regions is influ- 
enced by sea-spray aerosol, with about 25% of the heterogeneously 
nucleated ice particle residuals identified as sea salt”®. Despite these 
indications that there is a marine source of atmospheric INPs, the SML 
has not previously been analysed for the presence of material capable 
of nucleating ice. To determine whether there is a source of atmo- 
spheric INPs in the microlayer, we examined the ice-nucleating 
properties of microlayer samples collected in the Arctic (July- 
August 2013), Atlantic (May-June 2014), North Pacific (June 2013), 
and in coastal locations off British Columbia, Canada (August 2013) 
(Extended Data Fig. 1). 

First we present experiments relevant to mixed-phase clouds, in 
which 1,1 droplets of microlayer samples from the Arctic and 
Atlantic Oceans were placed on a cold stage immediately after sam- 
pling, and cooled until frozen. The fraction of droplets that froze as a 
function of temperature, corrected for freezing depression caused by 
salts, is shown in Fig. 2a. The microlayer droplets consistently froze at 
higher temperatures than droplets of subsurface water (SSW) col- 
lected at depths of between 2 to 5 m at the same locations (Extended 
Data Table 1). Filtration and re-testing of the microlayer samples 
showed that most material that nucleated ice was between approxi- 
mately 0.2 and 0.02 um in size (Extended Data Fig. 2a). Material of 
this size has the potential to be lofted into the atmosphere through 
bubble-bursting processes, forming atmospheric INPs that are 
internally mixed with sea salt and other organics. To estimate atmo- 
spheric INP concentrations associated with marine organics, we 
determined the number of these INPs present per mass of organic 
carbon (Fig. 2b; see Methods); this result is used in the modelling 
section of this paper. 
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Figure 2 | Ice nucleation by material in the SML. a, Immersion mode fraction 
frozen curves for 1 pl Arctic and Atlantic SML and SSW droplets determined 
using the microlitre nucleation by immersed particle instrument (ul-NIPI), with 
example temperature uncertainties included. b, The cumulative number of INPs 
per gram of total organic carbon (TOC). Selected samples were diluted with ultra- 
pure (MilliQ) water to 10% and 1% of initial concentration. Uncertainties are 
included where error bars are larger than data points (see Methods for details). 
Equation for fit to data is INPs per gram TOC = exp[11.2186 — (0.4459 x T)]; 
note that temperature, T, is in °C. c, Ice nucleation by British Columbia (BC) coast 
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Experiments were also conducted under conditions relevant to 
cirrus clouds using microlayer and SSW samples from the North 
Pacific and the British Columbia coast. The activity of the collected 
samples was tested at —40 °C and compared with results from experi- 
ments with commercial sea salt and NaCl particles. Figure 2c shows 
example activation curves for aerosolized, dried and size-selected 
(200 nm diameter) particles. The SSW activation curves are very sim- 
ilar to those of sea salt and NaCl, with all showing sharp increases 
above 143% relative humidity with respect to ice (RHj..). This is 
consistent with homogeneous ice nucleation of solution droplets 
and suggests that crystalline salt particles did not contribute markedly 
to ice nucleation events observed at low RHice. In contrast, aerosol 
particles derived from microlayer samples all showed ice formation 
above the background level at lower RH;... The ice nucleation onset 
(RHice at which 1% of the aerosol particles were activated; Fig. 2d) 
varied between 115% and 133% RHj... which is comparable with 
efficient deposition mode INPs, such as Arizona test dust (ATD) 
and feldspar dust (orthoclase) particles of the same size’”. Filtration 
of the SML samples through filters with nominal pore sizes of 0.2 um 
increased the ice nucleation onset by about 12-16 + 4% RH{-., indi- 
cating that some ice active material larger than 0.2 um was present 
(Extended Data Fig. 2b). However, the onsets for the 0.2-1m-filtered 
samples remained well below the homogeneous threshold, indicating 
that there is a population of smaller INPs in the British Columbia and 
Pacific microlayer samples, consistent with the Arctic and Atlantic 
data. The cumulative number of ice nucleation sites per surface area 
as a function of RHi-., 1s(RHic-), was greater for the microlayer sam- 
ples than that found for ATD, kaolinite and feldspar mineral dusts?” 
(Extended Data Fig. 3; see refs 1, 2 for more information on n,). 

We emphasize that owing to the different nucleation processes 
involved, ice nucleation under mixed-phase cloud and cirrus condi- 
tions cannot be quantitatively compared. However, our results do 
clearly show that the microlayers at all the sampling locations were 
enriched in INPs compared with the SSW at the same locations. Here 
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and North Pacific samples under cirrus conditions. Example University of 
Toronto continuous flow diffusion chamber (UT-CFDC) activation curves under 
cirrus conditions (with background counts subtracted) for sea salt (SS), NaCl, 
British Columbia coast SML and SSW samples at —40 °C. d, Ice nucleation onset 
RH,.. for Pacific and British Columbia coast SML, SSW, sea salt and NaCl aerosol 
particles. For comparison, onsets for ATD and K-feldspar (orthoclase)” are 
shown. Ice-onset error bars represent one standard deviation based on three to 
four replicates. The solid line represents the water saturation line, and the dashed 
line is the theoretical homogeneous freezing threshold?’. 
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we present experiments designed to reveal likely candidates for 
the source of the ice-nucleating activity. 

Certain proteins have been identified as being highly ice active; 
however, heat denatures proteins, causing a reduction in activity, 
which is not the case for known inorganic INPs*. To test whether 
similar thermally labile material could be responsible for the observed 
ice nucleation, microlayer samples were heated at temperatures up to 
100 °C and retested for activity. Heating of Arctic and Atlantic samples 
resulted in a reduction of the ice nucleation activity with freezing 
shifting to lower temperatures (Extended Data Fig. 2c). Similarly, 
the onset RH;.. of a British Columbia coast sample increased by 
6 + 4% RHi--, whereas that of a Pacific sample was within uncertainty 
of the unheated sample (Extended Data Fig. 2b). This might be con- 
sistent with the presence of inorganic or other non-thermally labile 
INPs. The marked reduction in activity in samples from three out of 
four locations is consistent with the presence of thermally labile bio- 
logical INPs. 

The filtration tests on the sampled microlayers (mentioned earlier) 
show that there is a considerable population of INPs that pass through 
0.2 um filters (Extended Data Fig. 2a, b). The ice active materials are 
therefore probably smaller biological particles, for example, ultrami- 
crobacteria, viruses or extracellular material from phytoplankton or 
bacteria (exudate). Additionally, no correlation was found between 
freezing temperature and bacterial cell counts in the Arctic microlayer 
(Extended Data Fig. 4), which suggests that whole bacterial cells were 
not responsible for the observed ice nucleation. Given that terrestrial 
biological systems such as pollens*** and fungi**”* have been found to 
produce nanoscale or ‘macromolecular INPs unconnected with whole 
cells, we considered the possibility that marine INPs are associated 
with exudates from phytoplankton or other marine microorganisms. 
This hypothesis is not only supported by the filtration tests, but by a 
tentative correlation between the North Pacific microlayer sample ice 
activation onsets with both the dissolved organic carbon concentration 
(DOC; >0.2 1m) and polysaccharide-rich transparent exopolymer 
particles (TEPs), which are associated with phytoplankton exudates 
(Extended Data Fig. 5). 

Qualitative compositional analysis of two Arctic samples using 
scanning transmission X-ray microscopy coupled with near edge 
absorption fine structure spectroscopy (STXM/NEXAFS; Fig. 3a-c) 
indicates the presence of both diatom cell wall and exudate com- 
pounds (see Methods). Spectra of exudates from the ubiquitous marine 
diatom T. pseudonana”’ share absorption features with the microlayer 
samples. These data are in keeping with studies showing that 
diatom exudates are present in microlayer samples’* and consistent 
with the fact that diatoms are the dominant phytoplanktonic group in 
polar regions”. 

Diatom cells and fragments have been shown to nucleate ice het- 
erogeneously® but, as demonstrated, whole cells are not solely 
responsible for the ice nucleation activity we observe in the microlayer 
samples. Here we investigate whether exudates separated from 
T. pseudonana diatom cells can nucleate ice heterogeneously. The 
ice nucleation efficiency of exudate from an axenic unialgal culture 
of T. pseudonana filtered through a 0.1 1m filter was measured as a 
function of temperature and water activity (a,,) in nanolitre volume 
droplets. Exudate freezing temperatures were found to be similar to 
those of washed diatom cells in the absence of exudate material and 
approximately 9-13 °C warmer than observed homogeneous freezing 
temperatures of 0.2-j1m-filtered and autoclaved Atlantic sea water 
collected 100 km offshore of Long Island, New York (the same water 
used to culture diatoms) with and without added nutrients (Fig. 3d; 
freezing curves are shown in Extended Data Fig. 6). While the freezing 
temperatures shown in Fig. 3d are not directly comparable to the 
microlayer droplet experiments in Fig. 2a, as they used much smaller 
droplets, they do show that material associated with exudates can 
nucleate ice. 
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Figure 3 | Spectroscopic analysis of Arctic SML samples and freezing 
experiments with diatom exudate. a, b, X-ray images of Arctic SMLS5 (a) and 
SML19 (b). Example locations at which spectra were acquired are indicated 
by green and blue boxes. c, X-ray absorption spectra of organic material in 
SML5, SML19 and exudates from the diatom T. pseudonana. d, Freezing and 
melting temperatures, collected using the water-activity-controlled immersion 
freezing experiment (WACIFE), as a function of water activity (a) for 
nanolitre volume droplets containing T. pseudonana exudates, and filtered and 
autoclaved natural sea water with and without f/2 nutrients. Heterogeneous 
ice nucleation temperatures in the presence of diatom cells and homogeneous 
ice nucleation of aqueous NaCl droplets are shown for comparison®”’. 
Vertical error bars represent 10th and 90th percentiles of about 300 
individual freezing events. Horizontal error bars indicate the uncertainty in 
Ay of £0.01. 


Given the ice nucleation activity of exudates from T. pseudonana, 
the presence of similar material in Arctic microlayer samples and 
the results of the filtering and heating experiments, we suggest that 
biogenic INPs present in phytoplankton exudates are a good candidate 
for the source of activity observed in the sampled microlayers. 
A substantial mass fraction of submicrometre sea-spray aerosol is 
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organic material’*"®, which is often associated with phytoplankton 


exudates'®”’. Our results indicate therefore that some fraction of sea- 
spray aerosol particles will be capable of nucleating ice. 

To explore the possible contribution of marine biogenic INP sources 
to the global atmospheric INP distribution, we combined our experi- 
mental data with the modelled distribution of emitted primary organic 
material in sea-spray aerosol'® (Fig. 4a; see Methods for details). 
To calculate atmospheric INP concentrations, we assume that the 
organic component of sea-spray aerosol simulated by the model has 
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a temperature-dependent INP concentration (per mass of organic 
carbon) equal to that measured in the Arctic and Atlantic samples 
(Fig. 2b). In other words, the number of INPs is directly related to 
the mass of organic carbon computed by the model. At our current 
level of understanding, this can be considered as an estimate for the 
number of INPs present in the organic component of sea-spray aero- 
sol. The predicted surface-level marine organic INP concentrations 
([INP]~—15; the concentration of INPs if an air parcel were cooled to 
—15°C at water saturation) are also shown in Fig. 4a. The largest 
concentrations of marine INPs occur over the Southern Oceans, the 
North Atlantic and the North Pacific in regions where biological activ- 
ity in surface ocean waters and wind speeds are greatest. Comparison 
of the simulated annual mean marine INP concentrations at sea level 
agree within +1 order of magnitude with the Bigg* atmospheric INP 
measurements in the Southern Ocean and around the coast of 
Australia (Fig. 4b), while mineral dust from deserts (the major terrest- 
rial INP source) only accounts for a small fraction of the observed INPs 
in this region. We also find good agreement with measurements made 
previously* in the southern Pacific. We note that some of the INP 
measurements made in the equatorial Pacifict tend to be under- 
predicted by our model, implying either another source of INPs or a 
stronger marine INP source than predicted here, possibly related to 
short-term variability in ocean biota. 

We also use the model to assess the transport of marine INPs to 
altitudes relevant for mixed-phase clouds. Figure 4c shows the con- 
centration of INPs active at —20°C from marine sources at 850 hPa 
in comparison with the contribution from desert dusts based on 
K-feldspar distributions”. This suggests that marine biogenic sources 
of INPs are competitive with, or more important than, desert sources 
in large parts of the Southern Ocean, the North Atlantic and the North 
Pacific. To assess whether marine organic INPs exist in regions of 
the atmosphere that are sufficiently cold for them to activate to ice, 
we plot 3-monthly averaged seasonal distributions of INPs active at the 
ambient temperature ([INP]ambient) along a transect through the 
Atlantic (at 30° W; Fig. 4d). This plot suggests that marine organic 
sea spray may contribute markedly to cloud glaciation at high and mid 
latitudes during the wintertime in the respective hemispheres. It also 
shows the strong seasonal differences caused by temperature and 
source strength to the relative [INP]ambient Contribution from marine 
and dust sources. 

We show that surface active organic material from the microlayer 
that is similar to that found in sea-spray aerosol nucleates ice, and we 
tentatively identify the ice active material as being connected to diatom 
exudates. Our findings also suggest that marine organic material 
may be an important global source of atmospheric INPs, particularly 
in areas remote from terrestrial sources such as the Southern 
Ocean. This work highlights the need for more field measurements 
of remote atmospheric INP concentration and investigation of its 
relationship with ocean surface water characteristics, including the 
local phytoplankton community, organic carbon concentrations and 


Figure 4 | Global distribution of atmospheric marine biogenic INPs. 

a, Modelled distribution of INP concentration active at —15°C (m *) and 
surface-level marine aerosol organic mass concentration (tg m~*); the 
locations of Bigg’ (circles) and Rosinski* (triangles) data are shown. 

b, Comparison of model-simulated INP concentration versus the Bigg’ and 
Rosinski* measured concentrations for the same location at the activation 
temperature of the measurements; open symbols are for K-feldspar only and 
filled symbols are for mineral dust plus marine organic. Example error bars 
shown on red points are based on spread in INPs per gram TOC (see Fig. 2b). 
c, Modelled distribution of marine biogenic INP concentrations active at 
—20 °C at 850 hPa (corresponding to the altitude of high-latitude mixed-phase 
clouds). Black contours indicate the INPs from desert dust based on K-feldspar 
emissions”. d, Seasonal altitude profiles (expressed as pressure) showing 
[INP ]ambient (INP concentration active at local temperature conditions) from 
marine sources (colour scale) and K-feldspar (black contours), for a transect 
from the South to North poles through the Atlantic (30 °W). 
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chemical characteristics, as well as the organic loading and nature of 
sea-spray aerosol. 


Online Content Methods, along with any additional Extended Data display items 
and Source Data, are available in the online version of the paper; references unique 
to these sections appear only in the online paper. 
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METHODS 

SML and SSW sampling. SML sampling took place in the Arctic during the 
Aerosol-Cloud Coupling and Climate Interactions in the Arctic (ACCACIA) 
cruise, in the Atlantic as part of the Western Atlantic Climate Study II (WACS 
II) cruise, as well as in the Northeast Pacific and off the southern coast of British 
Columbia, Canada, as part of the Network on Climate and Aerosols: Addressing 
Key Uncertainties in Remote Canadian Environments (NETCARE) project 
(Extended Data Fig. 1). See Extended Data Table 1 for precise sampling locations. 

During the ACCACIA campaign in the Arctic, microlayer sampling was con- 
ducted from the RRS James Clark Ross in both open waters and within leads in the 
marginal ice zone (Extended Data Table 1 and Extended Data Fig. 1a). Microlayer 
samples were collected into borosilicate glass bottles from a hydrophilic Teflon 
film on a rotating drum fitted to the ‘Interface II’ remote-controlled sampling 
catamaran**. SSW was sampled with Niskin bottles on a CTD (conductivity, 
temperature, depth) rig at the same locations, generally at a depth of ~2 m. To 
avoid potential contamination from the ship, the microlayer sampler was navi- 
gated to a distance of 75-200 m upwind of the stationary ship before sampling 
commenced. Owing to rougher conditions during the WACS II campaign in the 
Atlantic, the Interface II was tethered to the CTD arm of the R/V Knorr during 
microlayer sampling. SSW was collected either using the direct uncontaminated 
ship input at 5-m water depth or using a sampling container lowered over the side 
of the ship (for details see Extended Data Table 1). During both campaigns, before 
and after microlayer sampling, sea water from the ship’s uncontaminated supply 
was flushed through Interface II’s sampling system to clear any previously col- 
lected microlayer. Samples of the ‘flushing’ water were collected and analysed and 
compared to the SSW using droplet freezing assays to check that INPs from 
previous sampling had been removed (Extended Data Fig. 8). Microlayer samples 
for water activity and organic carbon analysis were frozen immediately after 
collection at —80 °C. 

During the NETCARE campaign, SML samples and corresponding SSW sam- 
ples from water depths of 0.5-1.0 m were collected in the Northeast Pacific Ocean 
at three different locations (Extended Data Table 1 and Extended Data Fig. 1c). 
The samples were collected using a glass plate with the exception of British 
Columbia (BC) coast SML3, which was collected using an autoclaved stainless 
steel screen’®. All samples were stored in high-density polyethylene bottles. North 
Pacific samples (Pacific SML1, 2 and 3) were kept frozen at —20 °C after collection, 
and before the experiment they were thawed and stored at 4°C in the dark. 
All other samples were stored at 4°C in the dark for no more than 10 days 
before analysis. 

Effect of different sampling techniques on INP abundance in Pacific samples. 
BC coast samples SML3 and SML4 were sampled at the same location 1 h apart but 
using metal mesh and glass plate techniques, respectively. There is a significant 
difference in the onset humidity of the two samples, with the onset of ice formation 
for BC coast SML3 occurring 13% RHice lower than for BC coast SML4. With the 
time lag between collection of the two samples and because the methods sample 
different thicknesses of the SML (the metal plate collects layers 2-4 times thicker 
than those layers collected with glass plates'®), it is not clear if the difference in their 
onset RH;,. was due to the different sampling methods. 

Ice nucleation experiments during ACCACIA, WACS II and NETCARE. 
Arctic and Atlantic SML and SSW samples were analysed for the presence of 
INPs using the previously described**** microlitre nucleation by immersed par- 
ticle instrument (l-NIPI). Briefly, droplets with a volume of 1.0 + 0.1 pl were 
pipetted onto a hydrophobic microscope coverslip (Extended Data Fig. 9a) and 
cooled ata rate of 1 K min” using a Grant-Asymptote EF600 cold stage (Extended 
Data Fig. 9b) until all droplets were frozen. The temperature at which individual 
droplets froze was recorded, with an uncertainty of +0.4°C. Experiments were 
also performed using diluted microlayer samples (Arctic SML3, 6, 16 plus Arctic 
SMLS filtered at 0.2 bum and Arctic SML12 filtered at 10 um), where 1 ml of micro- 
layer was added to 9 ml (10% dilution) or 99 ml (1% dilution) of 18.2 MQ cm distilled 
water (Milli-Q). The water activity (a,,) of Arctic microlayer and SSW samples was 
measured at 25 °C using an Aqualab Series 3 dew point activity meter. 

Samples from the North Pacific and BC coast were analysed using the 
University of Toronto continuous flow diffusion chamber (UT-CFDC). Owing 
to the high total submicrometre particle concentration of both the microlayer and 
SSW samples (10°cm~? after drying, measured with a TSI 3782 condensation 
particle counter) they were diluted by a factor of approximately 20 with 
18.2 MQcm water before atomization using a TSI 3076 atomizer. Water was 
removed by passing the sample flow through three diffusion dryers, and the 
particle concentration was further lowered by dilution (Extended Data Fig. 9c). 
Two-hundred-nanometre mobility diameter particles were selected using a dif- 
ferential mobility analyser (TSI 3081), with size-selected particle concentrations 
of approximately 100 cm’. The size-selected particles were exposed to ice super- 
saturated conditions at —40 °C in the UT-CFDC to determine their ice nucleation 
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onset humidities*®. Particle counts from both channels (>0.5 um and >5 pm) of 
the optical particle counter (Climet CI-20) were used to distinguish between 
interstitial aerosol particles and ice particles. The standard solutions consisted 
of 8-10 mg of NaCl (Sigma Aldrich, $2830) and commercial sea salt (Sigma 
Aldrich, $9883) dissolved in 50 ml of 18.2 MQ cm water. Control experiments 
with filtered air were conducted in the field. 
Ice nucleation experiments with diatom exudates. For diatom exudate freezing 
experiments, axenic unialgal cultures of T. pseudonana were grown in flasks at 16- 
18 °C with a 14h light:10 h dark cycle in 0.1-1m-filtered and autoclaved sea water 
with f/2 nutrient addition”’. Sea water was collected at a depth of about 0.5 m about 
100 km off the coast of Long Island, New York’*. After 1 week of growth, when 
concentrations reached ~10° cells ml~', cultures were filtered through a 0.1 ym 
pore size filter to remove the cells, yielding a suspension of diatom exudates. 
Droplets of filtered diatom exudate were analysed using the water-activity- 
controlled immersion freezing experiment (WACIFE) instrument. Individual 
droplets were deposited in a grid pattern onto a hydrophobic glass plate. 
Additional droplets were generated from filtered and autoclaved natural sea water 
with and without f/2 nutrient addition” before diatom growth. Droplet a, was 
established by allowing the temperature-controlled droplets to come to equilib- 
rium in a humidity-controlled aerosol conditioning cell**°. Droplets were then 
sealed from ambient air, setting the droplets’ a,, equal to the applied RH. Ice 
nucleation was observed at a cooling rate of 10 K min“! using a cryo-cooling 
stage coupled to an optical microscope (see schematic of process in Extended Data 
Fig. 9d). Droplet sizes ranged from 60 to 129m circle equivalent diameters 
(82 um average). Individual droplet volumes were calculated from the spherical 
equivalent diameter derived from the digitally measured particle diameters cor- 
rected for the non-sphericity of the deposited particles and for different applied a,,. 
The total number of droplets at each investigated a,, for sea water, sea water plus 
f/2 droplets, and sea water plus f/2 plus diatom exudates were 115, 143 and 131, 
respectively. Homogeneous ice nucleation was observed for droplets generated 
from the sea water with and without f/2. The median freezing temperatures shown 
in Fig. 3d include 10th and 90th percentiles. Corresponding mean melting tem- 
peratures are shown with an uncertainty of 1 standard deviation. The uncertainty 
in ay is +0.01. The ice melting curve (dashed line) and the volume-corrected 
homogeneous freezing curve with an uncertainty in ay of £0.01 shown as the 
solid black line and grey shaded area, respectively, are parameterized as described 
previously*°. 
Freezing depression correction for immersion mode experiments. Data from 
the Arctic, Atlantic and diatom exudate immersion mode experiments were 
adjusted to account for the freezing depression caused by dissolved salts in sea 
water. First, a freezing curve as a function of a, was constructed through the 
median freezing points following the a,, criterion where median immersion freez- 
ing temperatures can be described by a horizontal shift in the ice melting curve*’. 
Then the difference between expected median freezing temperatures for pure 
water (that is, at ay = 1.0) and at the experimentally applied a,, were used as 
temperature offsets. 
Calculation of INPs per gram of organic carbon and associated uncertainty. In 
Fig. 2b we show the cumulative number of INPs per gram of TOC as a function of 
temperature for the Arctic and Atlantic microlayer samples. This calculation uses 
the time-independent singular description of ice nucleation*' that assumes the 
time dependence of freezing is of secondary importance to the distribution of ice 
nucleating particle types. In this case ‘INPs per gram of TOC is the same as ‘fy’ 
described in detail elsewhere’. It should also be noted that this is the same model 
that was used to calculate n, (ref. 1) for the BC coast and North Pacific samples, but 
this describes the number of ice active sites per surface area rather than mass. 
For all data points shown in Fig. 2b, error bars are based on the propagated 
uncertainties associated with volume measurements and organic carbon concen- 
tration measurements (Extended Data Fig. 7). The error bars for experiments in 
which microlayer samples were diluted with Milli-Q water also include uncer- 
tainty relating to the subtraction of background heterogeneous nucleation events. 
Freezing in pl-NIPI experiments using Milli-Q water droplets free from any 
added nucleants occurs at higher temperatures than predicted for homogeneous 
ice nucleation”. Using the results of 22 separate freezing experiments (727 drop- 
lets in total) the cumulative number of INPs per volume of Milli-Q was calculated. 
A line of best fit from this data as a function of temperature was used to estimate 
the number of background INPs present in our diluted microlayer samples. This 
value was subtracted from the diluted microlayer cumulative INP spectra and 
uncertainties relating to the variation in background INP concentrations were 
calculated based on the 68% confidence interval associated with the line of best fit. 
STXM/NEXAFS analysis of Arctic microlayer samples. STXM/NEXAFS ana- 
lysis was used to explore qualitatively the carbon functionality of organic material 
found in two Arctic SML samples (Arctic SML5 and 19). Analyses were performed 
at the Advanced Light Source on beamline 5.3.2.2, Lawrence Berkeley National 
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Laboratory*’. Overviews of the application of this technique to atmospheric part- 
icles and technical details on STXM methodology have been published else- 
where***’ The sample for STXM/NEXAFS analysis was collected by bringing 
the flat face of clean silicon nitride windows into contact with the pre-collected 
SML water surface and then lifting them off. Material adhering to the windows was 
allowed to air dry before examination. The transmission of soft single-energy 
X-ray photons across the raster-scanned sample was measured to obtain an 
image* exploiting the carbon K-edge spectra to identify carbon functionality 
and the overall contribution of inorganic components. Using X-ray energies 
278-320 eV, X-ray absorption of the ground state electron (1s orbital) of the 
carbon atom was probed to identify carbon-carbon double bonding (C=C), car- 
bonyl (C=O), hydroxyl (C-OH) and carboxyl (COOH) functional groups. 
Figure 3a shows an X-ray image of particulate material found in the Arctic 
SMLS5 sample taken at 320 eV, which includes organic and inorganic material 
and a frustule fragment. The spectrum for SML5 is characterized by (1) a dom- 
inant carbonyl peak (288.2 eV), (2) a secondary carbon double-bonding peak 
(285.4 eV), and (3) a gradual rise in absorption in the energy range where the 
hydroxyl functional group absorbs (287.3 eV). Similar spectra for diatom cell wall 
material have previously been observed*'. The spectrum observed from SML19 is 
similar to that of the spectrum obtained from diatom exudates and is characterized 
by (1) a dominant carboxyl peak (288.6 eV) and (2) a gradual rise in absorption in 
the energy range where the hydroxyl functional group absorbs (287.3 eV). Similar 
spectral features have been observed previously” for field-collected marine part- 
icles, which the authors attributed to the presence of polysaccharides, and also” for 
particles aerosolized from a laboratory breaking wave. These features were also 
observed in carbon absorption spectra™ for 22 amino acids, all of which are present 
in T. pseudonana*’, supporting the notion that diatom exudates were present in 
this microlayer sample. The presence of spectral features similar to diatom cell wall 
fragments and diatom exudate material suggests the presence of diatoms in both 
SML5 and SML19. It is important to note that marine SMLs'*!*°*°8, SSWs”? and 
biofilms” are comprised of a complex mixture of inorganic particles, particulate 
organic matter in the form of microorganisms, biogenic debris, polysaccharide 
enriched microgels, lipids, proteins and amino acids. Therefore, the X-ray spectra 
shown here will not be identical for different SML samples. Instead, key spectral 
features including peak locations and dominance, represent typical biogenically 
derived materials in the marine environment. 
Microlayer filtering and heating tests. A selection of fresh Arctic SML samples 
were passed through filters with a range of pore sizes (0.02 1m Whatman Anodisc, 
0.1 jm Whatman Anodisc, 0.2 um Sartorius Minisart, 0.45 jtm Sartorius Minisart, 
2.0m Millipore TTTP) and then tested for changes in immersion mode ice 
nucleating activity (Extended Data Fig. 2a). Selected NETCARE samples were 
filtered through 0.2 tm polyethersulfone membranes (IC Acrodisc) and then 
retested under cirrus conditions for ice nucleation activity (Extended Data Fig. 2b). 
Selected Arctic and Atlantic samples were tested for immersion freezing activity 
after having been heated in a temperature-controlled bath for 10 min. This was 
repeated at 8 temperatures between 20°C and 100°C (Extended Data Fig. 2c). 
NETCARE samples, BC coast SML3 and Pacific SML1 were heated for 10 min at 
100 °C and retested under cirrus conditions for ice nucleation activity (Extended 
Data Fig. 2b). 
Flow cytometry for ACCACIA SML samples. Samples (2 ml) were transferred 
into a cryovial, and 50 ul of 50% glutaraldehyde was added to achieve a 0.5% 
solution. The preserved sample was stored in the refrigerator for 30 min before 
snap-freezing in liquid nitrogen for storage at —80 °C. Prior to analysis, samples 
were defrosted and the nucleic acid stain SYBR Green 1 dissolved in 300 mM 
potassium nitrate solution was added to achieve a 1% concentration of the stain 
(see refs 61 and 62 for more details). The samples were kept in the dark at room 
temperature for 1 h. Bacterial abundance shown in Extended Data Fig. 4 was 
quantified using a flow cytometer (Becton Dickinson FACScan) following meth- 
ods outlined in the literature®?. 
TOC, DOC and TEP characterization. Transparent exopolymer particle (TEP; 
Extended Data Fig. 5b) concentrations were determined using spectrophotometric 
methods®. Samples (5-50 ml) of NETCARE SML and SSW were filtered imme- 
diately after sampling onto 0.2 j1m polycarbonate membranes under low vacuum 
pressure (<100 mm Hg) and the membranes were stained with 0.5 ml Alcian blue 
solution (0.02 g Alcian blue in 100 ml of acetic acid solution, pH 2.5), and rinsed 
twice with 1 ml of deionized water. Membranes were subsequently extracted in 
6 ml of 80% sulfuric acid (H2SO,) for 2 h to dissolve the dye, and the absorbance of 
extracts was measured in a 1 cm cuvette at 787 nm with standardization using 
xanthan gum equivalents (X.,) and conversion to 1g C1’ using a factor of 0.63 
based on the compilation of multiple studies using phytoplankton cultures™. 
For analysis of DOC® in the NETCARE samples from the North Pacific 
(Extended Data Fig. 5a), subsamples of 20 ml were filtered through 0.2 ym 
polycarbonate membranes and the filtrates preserved for later analysis with 4 ul 


H3PO, per ml sample. Concentrations of DOC were quantified using a Shimadzu 
TOC-V analyser. 

Organic carbon analysis was also performed on the Arctic microlayer samples 
(Extended Data Fig. 7). After filtering through 0.2 jum pore size Whatman Anodisc 
filters, DOC was measured using a Shimadzu TOC-V using previously described 
methods®*’. Owing to high carbon concentration the Arctic microlayer samples 
for total organic carbon (TOC) analysis were diluted with one part sample to two 
parts deionized water. Total carbon analyser (TOC 5050A, Shimadzu) was used to 
measure total organic and inorganic carbon in each sample twice (with coefficient 
of variance between measurements <2%); the average of these measurements 
was taken. 

Atlantic samples were also analysed for DOC; 40 ml of sample was filtered 

through a 25 mm GFF filter and stored in a glass septum bottle. The resulting 
supernatant was acidified to a pH of 1 with roughly 3 drops of pure HCI to react 
with any inorganic carbon, and inhibit potential microbial degradation before 
analysis in the DOC analyser (Shimadzu TOC-L, +1% precision). Prior to 
DOC collection each septum bottle and filtration apparatus was acid washed, 
and GFF filters were pre-combusted for 5 h at 450°C. Particulate organic carbon 
(POC) was measured in Atlantic samples by filtering SML through pre-combusted 
25 mm GFF filters (0.7 jum pore size) before analysis (filters were frozen at —20 °C 
until processing). 
Global modelling study. To assess the potential contribution of marine biogenic 
INP sources to the global atmospheric INP distribution, we combined our experi- 
mental data from immersion mode experiments using the Arctic and Atlantic 
SML samples (the fit shown in Fig. 2b) with a modelled distribution of emitted 
primary organic material in sea-spray aerosol’. This was compared to desert dust 
INP concentrations based on emissions of K-feldspar*”. 

Atmospheric marine POC distributions are taken from ref. 10. Briefly, organic 
material (OM) in sea spray is related to emitted NaCl by an empirical, observation- 
ally based enrichment factor (OMaerosoi/ NaClacrosot)/(OMsea/NaCleea) = 500. 
Furthermore, the OM fraction in emitted sea spray (OMaerosoi/(OMaerosol + 
NaClyerosol)) Was not allowed to exceed 76%; which was the maximum observed 
submicrometre organic sea spray fraction according to ref. 68. Marine organic 
carbon is emitted in proportion to climatological POC as retrieved by the 
MODIS-Aqua satellite, transported as soluble r= 100 nm particles in the atmo- 
sphere, and removed by size-dependent wet and dry deposition processes. The OM 
distribution (using OM = OC X 1.9) was shown previously” to be consistent with 
another global model study of atmospheric marine OM® and is within a factor of 
two of annual mean atmospheric measurements at Amsterdam Island (37°48’ S, 
77°34' E)® and Mace Head (53°20’ N, 9°54’ W)*. 

The distribution of dust INPs was based on a previous study*’, which used the 
global aerosol processes model (GLOMAP). In this study we used GLOMAP- 
mode”, a two-moment aerosol size-resolving scheme, which calculates particle 
mass and number in seven variable-size log-normal modes. The model is forced by 
ECMWE 6-hourly global meteorological analyses and was run at a resolution of 
2.8 X 2.8, with 31 pressure levels extending to ~10 hPa. Dust emissions are taken 
from the AEROCOM daily resolved dust inventory for 2000 (ref. 71). Emissions 
are separated into a feldspar and bulk component using a mineralogical invent- 
ory”. Dust is emitted into the insoluble (or primary) size distribution and is aged 
to the soluble distribution via condensation of SO, and secondary organics after 
which it is subject to wet scavenging processes. Evaluation of modelled dust mass 
concentrations against surface observations (from the University of Miami net- 
work) show a model bias of ~30% with the majority of observations within a factor 
of 2 of the modelled mass”. 

Mineral dust INP concentrations (originating from K-feldspar) were calculated 
offline via the time independent model as discussed previously*®. K-Feldspar 
(assumed to be 35% of Feldspar volume) surface area and particle number was 
calculated assuming external mixing within the soluble modes. 

Figure 4d shows seasonal modelled concentrations of marine (colour scale) and 
K-feldspar (contours) INPs that are active at the ambient model temperature. We 
refer to this concentration as [INP] ambient and it is a useful indicator of locations in 
which INP concentrations are sufficiently high and the temperatures sufficiently 
low to potentially influence clouds. Both marine and K-feldspar [INP] ambient Were 
calculated using averaged daily temperatures and averaged monthly marine 
organic and K-feldspar emissions during the indicated periods (left panel, 
December-January-February; right panel, June-July-August). In both cases the 
parameterizations used are valid over a limited temperature range (see Fig. 2b for 
marine INP, and ref. 30 for K-feldspar). We did not extrapolate for INP concen- 
trations at temperatures above the upper limits of the parameterization, instead we 
assumed the aerosol had no ice-nucleating activity at higher temperatures. Owing 
to the INP numbers being cumulative as temperature decreases, the concentration 
of INPs at the lowest valid temperature of the parameterizations represent the 
lower limits for INP concentrations at temperatures colder than the valid range. 
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Below —38°C we do not show [INP]ambient Since in this regime homogeneous 
nucleation will dominate. 
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Extended Data Figure 1 | Sampling locations. SML and SSW samples were 
collected during the ACCACIA campaign (July-August 2013) at Arctic 
sampling stations at the locations marked with solid red circles. Also shown are 
sampling locations during the WACS II campaign (May-June 2014) in the 
North Atlantic Ocean. NETCARE samples were collected at locations in the 


Northeast Pacific (yellow star and green square, CCGS John P. Tully, 14-19 
June 2013). The red diamond and blue asterisk correspond to the sampling 
locations for the NETCARE British Columbia (BC) coastal samples (12-15 
August 2013). The inset is a zoom of the BC coast sampling locations. 
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Extended Data Figure 2 | Effects of heating and filtering on the ice 
nucleation activity of microlayer samples. a, The effect of filtering through 
different pore-sized filters on the temperature at which 50% of droplets had 
frozen (T's9) of Arctic and Atlantic SML samples tested using the jl-NIPI. Error 
bars represent + the standard deviation calculated from the freezing 
temperatures in each experiment, which consisted of between 30 and 53 
individual events. Shaded grey area is the range of Ts, found for fresh unfiltered 
SSW during the campaign. b, Comparison of the UT-CFDC onset RH;.. of 
unfiltered, filtered (0.2 um) and heated (to 100 °C for 10 min) North Pacific and 
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symbols are as shown in Fig. 2d. The green symbols represent the filtered 
onsets, whereas the black symbols represent the heated results. Ice-nucleation- 
onset error bars represent one standard deviation based on three to four 
replicates. c, Results of heating tests using Arctic and Atlantic SML samples on 
Tso tested using the pI-NIPI. Error bars represent + the standard deviation 
calculated from the freezing temperatures in each experiment, which consisted 
of between 28 and 46 individual events. Shaded grey area is the range of Tso 
found for fresh untreated Arctic SSW. 
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Extended Data Figure 3 | Ice surface site densities for the Pacific microlayer | SML n, values are indicated by the coloured symbols, whereas the mineral dust 
samples. Comparison of the ice surface densities (n,) calculated from UT- n, values are indicated by the grey symbols. The dark grey and light grey 
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Extended Data Figure 4 | Bacterial cell counts for Arctic samples. 


a, Bacterial cell counts from flow cytometry performed on Arctic SSW (black 
squares), fresh Arctic SML (red circles). b, The SML sample cell counts plotted 


against Ts, (temperature at which 50% of droplets frozen) and line of 
best fit, R’ = 0.29. 
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Extended Data Figure 5 | Correlation of TEP and DOC with the UT-CFDC RH;., onsets. a, b, Ice nucleation RH;.. onsets for Northeast Pacific (see Extended 
Data Table 1) samples plotted against measured DOC concentration (a) and TEP enrichment factor (b). Error bars represent the experimental uncertainty in 
relative humidity with respect to ice in the UT-CFDC. 
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Extended Data Figure 6 | Ice-nucleating activity of diatoms cells and their 
exudates. WACIFE frozen fraction curves derived from 60-129-t1m-sized 
droplets (~0.4 nl volume) as a function of temperature. Green symbols indicate 
diatom exudates in 0.1-1m-filtered sea water. Blue and red symbols 
represent 0.1-pim-filtered sea water devoid of exudates with and without the 
addition of growth media, respectively. All temperatures have been corrected 
for freezing point depression to pure water conditions from their initial 
aqueous solution water activity, a,, = 0.985 (open circles), 0.97 (open squares), 


0.96 (filled diamonds), 0.95 (open diamonds), 0.94 (filled circles), 0.925 
(open triangle), 0.90 (asterisks). Shaded areas illustrate ranges of observed 
heterogeneous ice nucleation of intact and fragmented diatom cells (green) and 
homogeneous ice nucleation of aqueous NaCl droplets (grey) for similar a, 
values**’. Error bars represent the instrumental uncertainty in temperature 
measurement. Predicted homogeneous freezing temperatures for similar sized 
water droplets are indicated by the grey bar’. 
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Extended Data Figure 7 | TOC and DOC measurements for Arctic samples. Arctic SML TOC and DOC measurements and Arctic SSW DOC measurements. 
TOC error bars represent the measured 2% coefficient of variance. DOC sample error was calculated as the coefficient of variation from the mean and standard 
deviation of three sample replicates. For comparison here we provide the Atlantic TOC measurements; Atlantic SML1 = 5.954 + 0.185 mg1 +, Atlantic 

SML2 = 4.643 + 0.135 mg 1. 
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Extended Data Figure 8 | p1-NIPI freezing curves for Arctic and Atlantic flushing water (grey points; symbols correspond to those for SML sampled at 
samples uncorrected for freezing depression caused by salts. Fractionfrozen _ the same locations) to check for the absence of contaminant INPs before 
curves for 1 jl droplet freezing experiments using Arctic and Atlantic ocean sampling. 

samples, uncorrected for freezing point depression. SML, SSW and boat 
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Extended Data Figure 9 | Summary of ice nucleation experimental setups. _ immersion freezing experiment (WACIFE) for freezing of micrometre-sized 
a, Pipetting 1 jl droplets onto a hydrophobic glass slide. b, Schematic of droplets containing diatom exudates as a function of water activity, ay, and 
the p-NIPI cold stage used for immersion mode droplet freezing experiments. _ relative humidity, RH. Images are not to scale. The procedure for preparing and 
c, Schematic of the experimental setup for cirrus cloud relevant experiments. _ freezing droplets of filtered and autoclaved natural sea water with and 

CPC, condensation particle counter; DMA, differential mobility analyser; without added f/2 nutrients droplets is similar except 0.1 jum filtration is 
OPC, optical particle counter. d, Schematic of the water-activity-controlled not required. 
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Date Time / Location (see Longitude Latitude Samples collected 
UTC Extended data Fig. 
1) SML SSWisampling depth 
ACCACIA 
19/07/2013 08:22 3 20°41.700W 70°13.494N Arctic SML3 Arctic SSW3 /2m 
21/07/2013 07:56 5 20°18.336W 71°53.692N Arctic SML5 Arctic SSW5 / 5m 
22/07/2013 08:00 6 13°06.120W 73°06.340N Arctic SML6 Arctic SSW6 / 2m 
24/07/2013 08:02 8 08°43.511W 75°02.956N Arctic SML8b Arctic SSW8 / 2m 
25/07/2013 08:07 9 15°31.311W 76°31.902N Arctic SML9 Arctic SSW9 / 2m 
26/07/2013 14:24 10 5°18.642W 76°16.141N Arctic SML10 Arctic SSW10/2m 
27/07/2013 09:19 11 7°04.645W 77°57.427N Arctic SML11 Arctic SSW11/2m 
28/07/2013 06:15 12 7°01.993W 78°53.663N Arctic SML12 Arctic SSW12/2m 
29/07/2013 08:50 12.5 5°13.61W 77°27.207N Arctic SML12.5 Arctic SSW12.5/ 2m 
30/07/2013 12:01 13 7°35.043E 74°48.828N Arctic SML13 Arctic SSW13/2m 
03/08/2013 10:20 16 23°56.620E 80°08.900N Arctic SML16 Arctic SSW16 / 2m 
04/08/2013 07:58 17 33°43.967E 83°18.630N Arctic SML17 Arctic SSW17 /0.4m 
04/08/2013 20:48 17.5 33°43.231E 83°18.381N Arctic SML17.5 Arctic SSW17.5/2m 
05/08/2013 07:53 18 26°07.684E 82°41.5N Arctic SML18 Arctic SSW18 / 2m 
06/08/2013 07:59 19 34°49.928E 81°00.1N Arctic SML 19 Arctic SSW19/0.3 m 
WAGS II 
23/05/2014 16:13 1 62.3256W 40.41335N Atlantic SML1 Atlantic SSW1 /5m 
23/05/2014 18:20 1b 62.195878W 40.399563N Atlantic SML1b 
26/05/2014 18:00 2 61.672607W 42.356586N Atlantic SML2 Atlantic SSW2 / 5m 
04/06/2014 20:30 5 70.5252W 40.68428N Atlantic SML5 Atlantic SSW5 / 0.5m 
NETCARE 

12/08/2013 BC coast 125° 54W 48°93N BC coast SML1 BC coast SSW1 / 0.5m 
12/08/2013 BC coast 125° 54W 48°93N BC coast SML2 BC coast SSW2 / 0.5m 
15/08/2013 BC coast 125°33W 48°54N BC coast SML3 
15/08/2013 BC coast 125°33W 48°54N BC coast SML4 
14/06/2013 NE Pacific 138°40.0W 49°34.0N Pacific SML1 
16/06/2013 NE Pacific 145°00.0W 50°00.0N Pacific SML2 
16/06/2013 NE Pacific 145°00.0W 50°00.0N Pacific SML3 Pacific SSW3/1m 


Details are given of samples collected during the ACCACIA Arctic cruise, WACS II Atlantic cruise and the NETCARE project. Northeast (NE) Pacific samples were collected as part of the Line P time series, cruise 
2013-2017. British Columbia (BC) coast samples were collected in Terrace Bay on the western coast of Vancouver Island (Canada) and at a location approximately 3 km offshore from Ucluelet. Location numbers 


relate to the maps shown in Extended Data Fig. 1. 
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Evolutionary origin of the turtle skull 


G. S. Bever'?, Tyler R. Lyson**, Daniel J. Field? & Bhart-Anjan S. Bhullar*’® 


Transitional fossils informing the origin of turtles are among the 
most sought-after discoveries in palaeontology’*. Despite strong 
genomic evidence indicating that turtles evolved from within the 
diapsid radiation (which includes all other living reptiles®’), evid- 
ence of the inferred transformation between an ancestral turtle 
with an open, diapsid skull to the closed, anapsid condition of 
modern turtles remains elusive. Here we use high-resolution com- 
puted tomography and a novel character/taxon matrix to study the 
skull of Eunotosaurus africanus, a 260-million-year-old fossil rep- 
tile from the Karoo Basin of South Africa, whose distinctive post- 
cranial skeleton shares many unique features with the shelled body 
plan of turtles” *. Scepticism regarding the status of Eunotosaurus 
as the earliest stem turtle arises from the possibility that these 
shell-related features are the products of evolutionary convergence. 
Our phylogenetic analyses indicate strong cranial support for 
Eunotosaurus as a critical transitional form in turtle evolution, 
thus fortifying a 40-million-year extension to the turtle stem and 
moving the ecological context of its origin back onto land*’. 
Furthermore, we find unexpected evidence that Eunotosaurus is 
a diapsid reptile in the process of becoming secondarily anapsid. 
This is important because categorizing the skull based on the 
number of openings in the complex of dermal bone covering the 
adductor chamber has long held sway in amniote systematics”, and 
still represents a common organizational scheme for teaching the 
evolutionary history of the group. These discoveries allow us to 
articulate a detailed and testable hypothesis of fenestral closure 
along the turtle stem. Our results suggest that Eunotosaurus 


=)  Anapsid skull Anapsid skull 
as as 
plesiomorphy synapomorphy 


Progano: 


represents a crucially important link in a chain that will eventually 
lead to consilience in reptile systematics, paving the way for syn- 
thetic studies of amniote evolution and development. 

At least 270 million years® of evolution within the reptile crown 
clade has produced a panoply of cranial forms. From the hyperkinetic 
anatomy of snakes to the encephalized and highly visual architecture of 
birds, the reptile skull is an increasingly popular model for understand- 
ing the evolution and development of vertebrate adaptation’. 
Turtles are an important yet enigmatic piece of this evolutionary puz- 
zle. The earliest uncontroversial stem turtles’* exhibit an anapsid skull 
with an adductor chamber concealed by bone (Fig. 1). Although emar- 
gination has modified this dermal covering in the vast majority of 
crown-group turtles”, the absence of temporal fenestration is a feature 
shared by currently recognized crown- and stem-group turtles, the 
immediate fossil outgroups of the amniote crown clade, and many 
early reptiles (sauropsids)’°. If this absence reflects conservation of 
the ancestral amniote condition, then turtles are an extant remnant 
of an early reptile radiation that excludes the other living forms (tua- 
tara, lizards, snakes, crocodilians, birds). If, however, turtles are nested 
within the radiation of anatomically diapsid reptiles, which includes 
both the diapsid crown group and a majority of its stem lineage’’, then 
the anapsid skull of turtles is a secondary configuration built on an 
ancestrally diapsid structural plan. Despite the strong support that a 
diapsid origin of turtles enjoys from genomic data sets*’, no direct 
palaeontological evidence yet exists for the loss of a diapsid skull along 
the turtle stem. This situation epitomizes a general discord between 
the fossil record and the molecular signature of living taxa that 


Figure 1 | Competing hypotheses for the 

origin of the anapsid skull of modern turtles. 
Historically, this closed condition was accepted as 
the conservation of the ancestral amniote state 
with turtles originating among early, long-extinct 
forms’*’*. More recent analyses largely reject this 
hypothesis for a turtle origin within crown-group 
Diapsida'*’’—a radiation that includes modern 
lizards, snakes, tuatara, crocodilians, and birds, and 
that is generally characterized by a skull with upper 
and lower temporal fenestrae (UTF and LTF, 
respectively). 
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currently obfuscates attempts to synthesize broad evolutionary pat- 
terns across Reptilia’®. 

Eunotosaurus africanus Seeley is an approximately 260-million- 
year-old fossil reptile from South Africa’’ that shares a number of 
uniquely derived postcranial features with turtles—features that 
appear to inform the evolutionary origin of the iconic turtle shell 
and the highly derived mechanism by which turtles ventilate their 
lungs**. The character-rich skull is an obvious source for testing these 
homology claims, but cranial details for Eunotosaurus remain scant. 
Existing studies stress the lack of cranial support for a privileged 
Eunotosaurus-turtle relationship’*~°, thus establishing a cranial- 
postcranial conflict that parallels the ongoing genotypic—phenotypic 
phylogenetic dispute. Here we use high-resolution computed 
tomography and phylogenetic analyses to: (1) examine the skull of 
Eunotosaurus; (2) test the current hypothesis that cranial data do 
not support this taxon as an early stem turtle; and (3) formulate an 
evolutionary model (predictive series of evolutionary steps) for the 
origin of the anapsid skull of modern turtles. 

The skull of Eunotosaurus is relatively short and wide (Figs 2, 3 and 
Extended Data Figs 1-3). Its compact snout bears approximately 23 
robust, subthecodont marginal teeth and nasals that are longer than 
the frontals. The palate includes an unfused basicranial articulation, an 
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Figure 2 | The adult skull of the early stem turtle Eunotosaurus africanus. 
a-c, Dorsal (a), left lateral (b), and ventral (c) views of the digitally segmented 
skull of CM777. d, Reflected right lateral view of CM 86-341. e, Digitally 
rendered and reflected right lateral view of the temporal region of CM 86-341. 
f, Composite reconstruction of dorsal (top) and left lateral (bottom) views. an, 
angular; ar, articular; bo, basioccipital; bs, parabasisphenoid; co, coronoid; d, 
dentary; ep, epipterygoid; ex, exoccipital; fr, frontal; ju, jugal; la, lacrimal; ls, 
laterosphenoid; mx, maxilla; na, nasal; op, opisthotic; pa, parietal; pra, 
prearticular; pf, prefrontal; pfr, postfrontal; po, postorbital; pp, postparietal; pr, 
prootic; pt, pterygoid; q, quadrate; qj, quadratojugal; sa, surangular; so, 
supraoccipital; sp., splenial; sq, squamosal; st, supratemporal; t?, putative 
tabular; v, vomer. 
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absence of basioccipital tubera, a laterally angled transverse process of 
the pterygoid, an edentulous ectopterygoid, a long interpterygoid 
vacuity, and a moderately sized suborbital fenestra. The gracile lower 
jaw exhibits a prominent coronoid process, a surangular that contri- 
butes to the articular surface, a splenial that does not participate in the 
mandibular symphysis, and 18 teeth. 

The cheek is open with a single, large fenestra. The fenestra is 
defined anteriorly by an expanded postorbital and tall, comma-shaped 
jugal. Its posterior margin is formed by an elongate, vertically oriented 
squamosal and quadratojugal. The quadratojugal is especially tall, 
spanning the full height of the cheek. A lower temporal arcade is not 
present, leaving the cheek open ventrally. The roof of the adductor 
chamber in a juvenile skull is marked by a distinct upper temporal 
fenestra (on both sides; Extended Data Fig. 2), previously unrecog- 
nized and identical to the unique upper temporal fenestra in uncon- 
troversial diapsid reptiles, which is separated from the lower fenestra 
by a completed upper temporal arcade (Fig. 3). In the adult, the upper 
fenestra is covered by a distinctly elongate supratemporal that contacts 
the postorbital and postfrontal anteriorly (Fig. 2). Digital removal of 
the supratemporal reveals the upper fenestra is retained in the adult, 
though its diameter is reduced through expansion of the surrounding 
elements—most notably the postorbital and squamosal (Fig. 3). The 
relatively late ontogenetic expansion of these bones also reduces the 
circumference of the lower temporal fenestra and modifies its rounded 
shape. The adductor chamber is closed posteriorly by what appears to 
be an expanded tabular, though this bony plate may represent a poster- 
olateral flange of the parietal. The putative tabular fills a space corres- 
ponding to what would otherwise be a moderately sized post-temporal 
fenestra (Extended Data Fig. 3). 

Phylogenetic analyses, employing maximum parsimony and 
Bayesian optimality criteria were performed on both the complete 
character matrix and one restricted to cranial features, all of which 
strongly support an exclusive Eunotosaurus-turtle clade (Extended 
Data Figs 4-7). Examples of turtle features expressed in the skull of 
Eunotosaurus include marked preorbital shortening, relative shorten- 
ing of the frontals, prootic-quadrate contact, and an anteriorly placed 
craniomandibular joint’’. The prootic also has an anterior contact with 
a plate-like ossification of the primary braincase wall (Extended Data 


5 mm (e) 


Figure 3 | The body plan of the early stem turtle Eunotosaurus africanus. 
a, b, The postcranium (a) and skull (b) of a juvenile Eunotosaurus africanus 
(SAM-PK-K7909) in dorsal and right lateral (reflected) views, respectively. 

c, d, The digitally rendered adult skull of Eunotosaurus in dorsolateral 
(CM777; c) and dorsal (CM86-341; d) views, with the supratemporal bone 
digitally removed in each. The UTF, clearly expressed in the juvenile, is retained 
in the adult but covered by the postnatal development of an elongate 
supratemporal bone. The size of the UTF and LTF of adult Eunotosaurus is 
reduced through late-stage ontogenetic expansion of the surrounding dermal 
elements. Abbreviations as in Figs 1 and 2. Scale bars: a-c, 1 cm; d, 5mm. 
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Figs 1 and 3) that corresponds to a rare combination of ossified 
orbital cartilages present at least in Proganochelys quenstedti among 
stem turtles”. 

These data resolve the cranial—postcranial conflict in that both mod- 
ules now support Evnotosaurus as an early stem turtle, thus bolstering 
evidence for the turtle stem in the terrestrial ecosystem of Gondwana 
approximately 260 million years ago (Supplementary Tables 1-3). 
Considering that Eunotosaurus is widely accepted as lying outside 
the pandiapsid radiation, this shared signal seems only to exacerbate 
the phylogenetic gap between the phenotypic and genotypic explana- 
tions of turtle origins. Although it is the case that neither our par- 
simony nor Bayesian analyses recover the dominant molecular 
solution of a unique turtle-archosaur relationship, both approaches 
do agree that turtles arose somewhere within the greater diapsid radi- 
ation (Extended Data Figs 4 and 5). Our morphological results remain 
ambiguous as to whether that origin is nested within the diapsid crown 
clade or among those stem forms expressing a morphologically diapsid 
skull. The lack of clear morphological support for the refined position 
of turtles within Eureptilia probably reflects some combination of the 
rate at which these stem lineages diversified and our poor understand- 
ing of their respective fossil records”. For example, the early lepidosaur 
stem comprises comparatively few taxa, and the most conservative 
stem archosaurs bear a striking resemblance to apparent proximal 
stem diapsids, throwing into question the sequence of acquisition 
and degree of variability of crown diapsid autapomorphies”’. 

The presence of a lower temporal fenestra in Eunotosaurus supports 
the hypothesis that the characteristically closed cheek of modern tur- 
tles is a secondary condition that evolved through expansion of the 
jugal, quadratojugal, and squamosal*. Singular expression of a lower 
temporal fenestra once unambiguously diagnosed mammals and their 
stem lineage’® but is now recognized in a number of phylogenetically 
disparate ‘anapsid’ parareptiles*, and may represent the ancestral 
condition for the amniote crown”. This observed pattern of concen- 
trated homoplasy near the evolutionary origin of a character state is 
congruent with the concept of a protracted zone of variability that may 
commonly confound attempts to resolve the early history of clades and 
character systems”. 

The amniote upper temporal fenestra has enjoyed a famously 
uncomplicated history, being a nearly consistent fixture of the diapsid 
body plan since its first appearance in the Carboniferous**. The com- 
bined morphologies expressed in the juvenile and adult specimens of 


Stem amniotes 


r>— Pan-Mammalia 


Parareptilia 
- Captorhinidae 
Araeoscelidia 
L4| ean 
23 Claudiosaurus 


1 Amniota 


Limnoscelis 


Stem amniote /— Sauropterygia 


-—— Pan-Archosauria 


4 t—— Pan-Lepidosauria 


LETTER 


Eunotosaurus provide not only the earliest direct evidence of an upper 
temporal fenestra in a putative stem turtle, but the first evidence for 
how that fenestra may have closed before the origin of the turtle crown 
clade. This evidence supports a model of temporal closure whose initial 
steps include a significant expansion of the dermal roof (postorbital, 
squamosal, and probably parietal) that first constricts and then closes 
the upper temporal fenestra (Fig. 4). Such steps are expected compo- 
nents of any hypothesis where turtles evolve from a diapsid ancestor. 
The addition of Eunotosaurus to this model is significant because it 
provides the first empirical evidence of these transitional expansions, 
which in turn allows the timing of these transformations to be 
anchored within the geological history of diapsids. Eunotosaurus also 
provides insight into the possible role of developmental timing in 
producing the modern anapsid condition. Phylogenetic acceleration 
(peramorphosis) of the inferred postnatal trajectory of dermal expan- 
sion around the temporal fenestrae of Eunotosaurus may explain the 
transition from a Eunotosaurus-like morphology to that expressed in 
Odontochelys, Proganochelys, and more crown-ward turtles. 

The role of the supratemporal is an aspect of the Eunotosaurus 
model of temporal closure that would not have been predicted based 
on earlier studies. An anteriorly expanded supratemporal that devel- 
ops late in postnatal development to cover the upper temporal fenestra 
must currently be considered an autapomorphy of Eunotosaurus. 
Future fossil discoveries will determine whether an expanded supra- 
temporal was ancestral to crown-ward turtles, but it is important to 
stress that an expanded supratemporal is not a necessary component 
of an evolutionary model of fenestral closure in turtles that has 
Eunotosaurus as a central figure. For example, an analogous secondary 
expansion of the supratemporal partially or completely covers the 
upper temporal fenestra of the marine thalattosaurs”. Moreover, the 
construction of the modern turtle skull played out over a time span of 
tens of millions of years, and it is well attested that dermal bones in 
major vertebrate lineages can shift back and forth considerably relative 
to the underlying tissues on these timescales; for instance in the com- 
plex history of the synapsid skull and the shoulder girdle of sarcopter- 
ygians (and tetrapods, most notably turtles themselves”). 

It is thus evident that the turtle skull, like the turtle postcranium, 
underwent profound modifications during its history that similarly 
obscured anatomical evidence for phylogenetic affinities by the time 
the crown-group condition was reached. The ecological context in 
which the earliest stem turtles lost their upper temporal fenestra is 


Figure 4 | Generalized amniote phylogeny 
showing sequence of major transformations in 
the origin of the turtle skull. (1) Ancestral crown 
amniote retains anapsid condition of amniote 
stem. (2) and (3) LTF and UTF appear, producing a 
fully diapsid skull. (4) Loss of lower temporal bar 
results in ventrally open LTF. (5) Size of UTF 

and LTF constricted through postnatal expansion 
of surrounding dermal elements (most notably 
postorbital and squamosal). Superficial covering of 
the UTF by the supratemporal, as expressed in 
Eunotosaurus, may or may not be ancestral to 
modern turtles. (6) Closure of LTF and UTF, 
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unclear, although the upper fenestra in extant diapsids relates to the 
bulging of the pseudotemporalis muscle mass. It is thus likely that 
fenestral closure along the turtle stem had implications for the mas- 
ticatory apparatus, and these implications may be reflected in the 
various modifications of the rostral portion of the skull. 


Online Content Methods, along with any additional Extended Data display items 
and Source Data, are available in the online version of the paper; references unique 
to these sections appear only in the online paper. 


Received 28 April; accepted 3 July 2015. 
Published online 2 September 2015. 


1. Li,C.,Wu,X.-C., Rieppel, O., Wang, L.-T.& Zhao, L-J. An ancestral turtle from the Late 
Triassic of southwestern China. Nature 456, 497-501 (2008). 

2. Lyson, T. R., Bever, G. S., Bhullar, B.-A. S., Joyce, W. G. & Gauthier, J. A. Transitional 
fossils and the origin of turtles. Biol. Lett 6, 830-833 (2010). 

3. Lyson, T. R., Bever, G.S., Scheyer, T. M., Hsiang, A. Y. & Gauthier, J. A. Evolutionary 
origin of the turtle shell. Curr. Biol. 23, 1113-1119 (2013). 

4. Lyson,T.R. etal. Origin of the unique ventilatory apparatus of turtles. Nat. Commun. 
5, 5211 (2014). 

5. Schoch, R. R. & Sues, H. -D. A Middle Triassic stem-turtle and the evolution of the 
turtle body plan. Nature 523, 584-587 (2015). 

6. Wang, Z. etal. The draft genomes of soft-shell turtle and green sea turtle yield 
insights into the development and evolution of the turtle-specific body plan. Nature 
Genet 45, 701-706 (2013). 

7. Field, D. J. et a/. Toward consilience in reptile phylogeny: miRNAs support an 
archosaur, not lepidosaur, affinity for turtles. Evol. Dev. 16, 189-196 (2014). 

8. Joyce, W. G. & Gauthier, J. A. Palaeoecology of Triassic stem turtles sheds new light 
on turtle origins. Proc. R. Soc. Lond. B 271, 1-5 (2004). 

9. Scheyer, T. M. & Sander, P. M. Shell bone histology indicates terrestrial 
palaeoecology of basal turtles. Proc. R. Soc. Lond. B 274, 1885-1893 (2007). 

10. Gauthier, J., Kluge, A. G. & Rowe, T. Amniote phylogeny and the importance of 
fossils. Cladistics 4, 105-209 (1988). 

11. Cundall, D. & Irish, F. in Biology of the Reptilia Vol. 20 (eds Gans, C., Gaunt, A. S., 
Adler, K.) 349-692 (Society for the Study of Amphibians and Reptiles, 2008). 

12. Bhullar, B.-A. S. et al. Birds have paedomorphic dinosaur skulls. Nature 487, 
223-226 (2013). 

13. Gaffney, E.S. The comparative osteology of the Triassic turtle Proganochelys. Bull. 
Am. Mus. Nat. Hist 194, 1-263 (1990). 

14. Werneberg, |. Temporal bone arrangements in turtles: an overview. J. Exp. Zool. B 
318, 235-249 (2012). 

15. Miller, J. in Recent Advances in the Origin and Early Radiation of Vertebrates (eds 
Arratia, G., Wilson, M. V. H., Wilson, R., Cloutier, R.) 379-408 (Verlag Dr. Friedrich 
Pfeil, 2004). 

16. Lee, M.S. Y. Turtle origins: insights from phylogenetic retrofitting and molecular 
scaffolds. J. Evol. Biol. 26, 2729-2738 (2013). 

17. Day, M., Rubidge, B., Almond, J. & Sifelani, J. Biostratigraphic correlation in the 
Karoo: the case of the Middle Permian parareptile Eunotosaurus: research letter. 
S. Afr. J. Sci. 109, 1-4 (2013). 


242 | NATURE | VOL 525 | 10 SEPTEMBER 2015 


18. Cox, C.B.The problematic Permian reptile Eunotosaurus. Bull. Br. Mus. Nat. Hist. 18, 
165-196 (1969). 

19. Keyser, A. W. & Gow, C. E. First complete skull of the Permian reptile Eunotosaurus 
africanus Seeley. S. Afr. J. Sci. 77, 417-420 (1981). 

20. Gow,C.E.A reassessment of Eunotosaurus africanus Seeley (Amniota: Parareptilia). 
Palaeont. Afr. 34, 33-42 (1997). 

21. Bhullar, B.-A. S. & Bever, G. S. An archosaur-like laterosphenoid in early turtles 

(Reptilia: Pantestudines). Breviora 518, 1-11 (2009). 

22. Reisz, R. R., Modesto, S. P. & Scott, D. M. A new Early Permian reptile and its 

significance in early diapsid evolution. Proc. R. Soc. Lond. B 278, 3731-3737 

(2011). 

23. Bickelmann, C., Miller, J. & Reisz, R. R. The enigmatic diapsid Acerosodontosaurus 

piveteaui (Reptilia: Neodiapsida) from the Upper Permian of Madagascar and the 

paraphyly of “younginiform” reptiles. Can. J. Earth Sci. 46, 651-661 (2009). 

24. Miller, J. Early loss and multiple return of the lower temporal arcade in diapsid 

reptiles. Naturwissenschaften 90, 473-476 (2003). 

25. Tsuji, L.A. & Miller, J. Assembling the history of the Parareptilia: phylogeny, 

diversification, and a new definition of the clade. Fossil Rec. 12, 71-81 (2009). 

26. Pineiro, G., Ferigolo, J., Ramos, A. & Laurin, M. Cranial morphology of the Early 

Permian mesosaurid Mesosaurus tenuidens and the evolution of the lower 

emporal fenestration reassessed. C. R. Palevol 11, 379-391 (2012). 

27. Bever, G. S., Gauthier, J. A. & Wagner, G. P. Finding the frame shift: digit loss, 

developmental variability, and the origin of the avian hand. Evol. Dev. 13, 269-279 

(2011). 

28. Reisz, R. R.A Diapsid Reptile from the Pennsylvanian of Kansas (Univ. of Kansas, 

1981). 

29. Rieppel, O. Clarazia and Hescheleria: a re-investigation of two problematical 

reptiles from the Middle Triassic of Monte San Giorgio (Switzerland). Palaeontogr. 

Abt A 195, 101-129 (1987). 

30. Lyson, T. R. et al. Homology of the enigmatic nuchal bone reveals novel 
reorganization of the shoulder girdle in the evolution of the turtle shell. Evol. Dev. 
15, 1-9 (2013). 


Supplementary Information is available in the online version of the paper. 


Acknowledgements We thank J. Botha-Brink, E. Butler, S. Kaal, E. De Kock, J. Neveling 
and R. Smith for access to Eunotosaurus specimens. M. Fox and Z. Erasmus prepared 
fossil material. M. Colbert, J. Maisano, M. Hill and J. Thostenson are acknowledged for 
their help with the digital data. We thank A. Balanoff, D. Dykes, J. Gauthier, R. Hill, 

B. Rubidge, R. Smith and K. de Queiroz for helpful discussions. G.S.B. extends special 
thanks to the Academic Technologies Group at NYIT for their support in the digital 
visualization of anatomical data. 


Author Contributions G.S.B. designed the study, processed the CT data, performed the 
analytical work, constructed the figures, and wrote the paper. T.R.L. performed 
analytical work, assisted writing the paper, and assisted with figures. D.J.F. and B.-A.S.B. 
performed analytical work and assisted writing the paper. 


Author Information Reprints and permissions information is available at 
www.nature.com/reprints. The authors declare no competing financial interests. 
Readers are welcome to comment on the online version of the paper. Correspondence 
and requests for materials should be addressed to G.S.B. (gbever@nyit.edu). 


©2015 Macmillan Publishers Limited. All rights reserved 


METHODS 


No statistical methods were used to predetermine sample size. The experiments 
were not randomized and the investigators were not blinded to allocation during 
experiments and outcome assessment. 

Specimens and CT scanning. Although Eunotosaurus africanus is known from a 
surprisingly large number of specimens (n > 44), relatively few of these include 
cranial material (see below). Our study was built largely around the adult mor- 
phology of CM777 and CM86-341, and the juvenile features of SAM-PK-K7909. 
NMQR3299 was also examined, but relatively poor preservation restricted its 
contribution to our cranial assessments. The skull of CM 777 was scanned at 
the University of Texas High-Resolution X-ray CT Facility (UTCT). Scanning 
was performed using no filter, an air wedge, a voltage of 200 kV and a current 
of 0.17mA. The resulting images were then processed for the removal of ring 
artefacts. The specimen was scanned along the coronal axis for a total of 1,003 
slices with an image resolution of 1024 X 1024 pixels and a reconstructed field of 
view of 34 mm. Voxel size (mm) is 0.03657 X 0.03320 X 0.03320. Each image has a 
reconstructed field of view of 34 mm. Additional images are available at DigiMorph 
(http://www.digimorph.org/specimens/Eunotosaurus_africanus). Original slice data 
are available on request. 

NMQR3299 (skull and postcranial skeleton) was scanned at UTCT with no 
filter, an air wedge, a voltage of 200 kV, and a current of 0.19 mA. Ring artefacts 
were removed. Scanning was performed along the coronal axis for a total of 1,764 
slices with a resolution of 1024 X 1024 pixels and a reconstructed field of view of 
62 mm. Voxel size (mm) is 0.06065 X 0.06065 X 0.06618. No digital segmentation 
of this data set was performed. 

CM86-341 was scanned at the American Museum of Natural History 
Microscopy and Imaging Facility (AMNH MIF) using a copper filter, an air wedge, 
a voltage of 150 kV, and a current of 124mA. The specimen was scanned along 
the coronal axis for a total of 950 slices with a resolution of 1024 X 1024 pixels 
and a reconstructed field of view of 35mm. Voxel size (mm) is 0.03403 X 
0.03403 X 0.03403. Digital segmentation of the recognizable cranial elements 
was performed using VGStudioMax2.1. 

This study also includes novel morphological data derived from the recent 

physical preparation of two specimens of Eunotosaurus (using small PaleoTools 
microjack and microscope). CM86-341 was prepared in 2010-2012 by M. Fox 
(Peabody Museum, Yale University) under the direction of J. Gauthier. SAM-PK- 
K7670 was prepared in 2014-2015 by Z. Erasmus (Iziko Museums of South Africa) 
under the direction of R. Smith. 
List of examined cranial specimens of Eunotosaurus africanus. CM86-341: 
beautifully preserved partial skull, completely articulated neck with a few cervical 
ribs, and complete carapace (nine dorsal vertebrae and nine pairs of dorsal ribs) 
(Fig. 2 and Extended Data Fig. 1). CM777: articulated skull, neck, elongate cervical 
ribs, shoulder girdle, limb elements, and cranial half of carapace including dorsal 
vertebrae and ribs (Fig. 2 and Extended Data Figs 2 and 3). Unnumbered CM 
specimen figured in ref. 19. NMQR3299: mostly complete skeleton including 
articulated skull (Fig. 2). NMQR3474: impression of a mostly articulated skeleton, 
including the skull. SAM-PK-K7670: highly weathered nodule with mostly com- 
plete skeleton including cranial two-thirds dorsal ribs and vertebrae, impressions 
of the cervical vertebrae, and an impression of the skull. SAM-PK-K7909: weath- 
ered nodule with complete shell and shoulder girdle, articulated neck, and essen- 
tially complete and articulated skull (Fig. 3). Juvenile based on its small size and 
expression of numerous features indicative of skeletal immaturity in reptiles (that 
is, unfused scapula and coracoid)**”. 

Institutional abbreviations used are as follows: CM, Council for Geosciences, 
Pretoria; NMQR, National Museum, Bloemfontein; RC, Rubidge Collection, 
Graaff-Reinet; SAM-PK, Iziko Museums of South Africa, Cape Town; YPM, 
Peabody Museum, Yale University, New Haven. 
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Phylogenetic analysis. The phylogenetic relationships of Eunotosaurus africanus 
were assessed using a novel character/taxon matrix consisting of 268 characters 
(174 cranial, 94 postcranial) scored for 47 taxa distributed broadly across amniotes 
with emphasis on Pan-Reptilia. Our primary purpose was to determine how an 
enhanced understanding of the cranial anatomy of Eunotosaurus affects its phylo- 
genetic status as an early stem turtle and the topological position of turtles in 
general. This is especially important considering that many of the more compel- 
ling Eunotosaurus-turtle synapomorphies are features associated with the iconic, 
shelled body plan**. While of considerable interest, the shared expression of these 
features also raises the question of convergence—perhaps there are only so many 
ways to build a shell. The skull serves as a relatively independent module on which 
to test the Eunotosaurus-turtle hypothesis. 

Tree topologies were inferred using both maximum parsimony and Bayesian 
optimality criteria. To specifically assess the dominant phylogenetic signal within 
the cranial data, separate analyses were performed on the complete matrix and a 
matrix restricted to cranial characters. The maximum parsimony tree was gen- 
erated using TNT1.1°°. Seymouria spp. was specified as the outgroup, and heur- 
istic searches were conducted using tree-bisection-reconnection (TBR) branch 
swapping with 1,000 replicates of random stepwise sequence addition. Minimum 
branch lengths were set to collapse. Support for each node was measured by 
calculating Bremer support and bootstrap frequencies, with 1,000 bootstrap repli- 
cates and 1,000 random sequence addition replicates. Characters 34, 62, 67, 85, 
107, 118, 148, 155, 193 and 220 (Supplementary Table 1) were treated as ordered 
as their derived states were interpreted as non-mutually exclusive (that is, the 
possession of either derived state reflects shared information that should be 
considered in the analysis). The implications of this approach were tested by 
comparing the results with iterations where all characters were analysed as 
unordered. 

Bayesian phylogenetic analyses were run using MrBayes (v3.2.2)** on the 
CIPRES Science Gateway”. The Mk model’® was used to analyse the full and 
cranial-only character matrices with gamma-distributed rate variation and vari- 
able coding. All analyses were performed with a sampling frequency of 1,000, two 
concurrent runs, and four Metropolis-coupled chains (T= 0.1) for 30,000,000 
generations. Characters 34, 62, 67, 85, 107, 118, 148, 155, 193 and 220 were again 
treated as ordered. All analyses were checked for convergence using standard 
MrBayes diagnostics (for example, PRSF < 0.01, mixing between chains >20%) 
and Tracer (v1.5)*” (for example, ESS > 200). A 25% relative burn-in was imple- 
mented for all summary statistics. 

A list of the cranial characters and their definitions is provided as 
Supplementary Table 1. Postcranial characters are taken directly from ref. 3. 
Supplementary Table 2 provides character scores and Supplementary Table 3 
provides a list of synapomorphies for each of the reptile clades within which 
Eunotosaurus is nested. Supplementary Table 4 lists the observed specimens 
and primary references from which we compiled our character matrix. 
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Extended Data Figure 1 | Digital reconstruction of segmented cranial 
elements of Eunotosaurus africanus CM 777. a, Palatal view with the 
lower jaws digitally removed and major roofing elements not rendered. 

b, Anteromedial view of left palate showing moderately sized suborbital 
fenestra. c, d, Posteromedial (c) and anterolateral (d) views of left quadrate, 
prootic, stapes, epipterygoid and midline parabasisphenoid. e, Right lateral 
view of anterior braincase wall and surrounding elements. Note sutural contact 
of prootic and quadrate. f-k, Lower jaws in dorsal (f), ventral (g), left lateral (h), 
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medial (left jaw) (i), anterior (j), and posterior (k) views. an, angular; ar, 
articular; bs, parabasisphenoid; co, coronoid; d, dentary; ect, ectopterygoid; epi, 
epipterygoid; fr, frontal; ju, jugal; la, lacrimal; Is, laterosphenoid; mx, maxilla; 
op, opisthotic; pa, parietal; pf, prefrontal; pl, palatine; pm, premaxilla; po, 
postorbital; pof, postfrontal; pr, prootic; pra, prearticular; pt, pterygoid; q, 
quadrate; gj, quadratojugal; s, stapes; sq, squamosal; sof, suborbital fenestra; sp., 
splenial; st, supratemporal; sa, surangular; v, vomer; II, inferred exit point for 
orbital nerve; V, prootic incisure, exit point for trigeminal nerve. 
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5 mm (a) 


Extended Data Figure 2 | The juvenile skull of Eunotosaurus africanus fenestrae decreases in the late stages of postnatal ontogeny through expansion 
(SAM-PK-K7909) showing clear expression of both LTF and UTF. a,b, Left of the surrounding dermal bones. The upper temporal fenestra is eventually 
lateral view with the rostrum held horizontally (a) and slightly downturned obscured by the late-stage ontogenetic development of an elongate 


(b). c, Close-up view of fenestrated cheek in right lateral view. The size of the — supratemporal. 


©2015 Macmillan Publishers Limited. All rights reserved 


LETTER 


satar an ju d 


Extended Data Figure 3 | Digitally segmented and reconstructed cranial maxilla; pa, parietal; pf, prefrontal; pl, palatine; po, postorbital; pof, postfrontal; 
elements of Eunotosaurus africanus (CM86-341). a-d, Right lateral view pt, pterygoid; q, quadrate; gj, quadratojugal; sa + ar, surangular and articular; 
(a), anterior (b), posterior (c), and right medial (d) views. an, angular; d, sq, squamosal; st, supratemporal; UTF, upper temporal fenestra; ?, unclear 
dentary; epi, epipterygoid; ju, jugal; la, lacrimal; ‘Is’, ‘laterosphenoid’; mx, identity. 
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Extended Data Figure 4 | Strict consensus of two most parsimonious 
recovered from total character matrix. Bremer support values are provided 
for each clade (above line). Bootstrap values exceeding 50% are provided 
(below line). A Eunotosaurus-turtle clade is extremely well supported. That this 
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pan-turtle lineage originated somewhere within the radiation of anatomically 
diapsid reptiles is well supported, although a refined phylogenetic position 
remains morphologically elusive. Tree length = 1,087; consistency 

index = 0.4013; retention index = 0.590. 
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Extended Data Figure 5 | Bayesian tree topology derived from total matrix 
(50% majority rule consensus). An exclusive Eunotosaurus-turtle clade is 
recovered with 100% posterior probability. This pan-turtle lineage is nested 
within the radiation of anatomically diapsid reptiles; however, in contrast to the 
parsimony solution, turtles are excluded from crown-group Diapsida. The 


Bayesian results agree with the parsimony in revealing strong support that: 
(1) Eunotosaurus is an early stem-group turtle; and (2) the ancestral stem turtle 
expressed a fully diapsid skull. The two analyses also agree that there is 
currently poor morphological support for a refined position of turtles within 
the greater diapsid radiation. 
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recovered from cranial-only matrix. The Eunotosaurus-turtle clade is convergence. Tree length = 777; consistency index = 0.3956; retention 


recovered, which supports the hypothesis that the postcranial synapomorphies index = 0.4743. 
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Extended Data Figure 7 | Bayesian tree topology derived from cranial-only matrix (50% majority rule consensus). When studied in isolation, cranial anatomy 
provides poor resolution of the deep divergences within Pan-Reptilia, but a Eunotosaurus-turtle signal is clearly present. 
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Arithmetic and local circuitry underlying dopamine 


prediction errors 


Neir Eshel!, Michael Bukwich!, Vinod Rao!, Vivian Hemmelder'’, Ju Tian! & Naoshige Uchida! 


Dopamine neurons are thought to facilitate learning by comparing 
actual and expected reward'”. Despite two decades of investiga- 
tion, little is known about how this comparison is made. To deter- 
mine how dopamine neurons calculate prediction error, we 
combined optogenetic manipulations with extracellular record- 
ings in the ventral tegmental area while mice engaged in classical 
conditioning. Here we demonstrate, by manipulating the temporal 
expectation of reward, that dopamine neurons perform subtrac- 
tion, a computation that is ideal for reinforcement learning but 
rarely observed in the brain. Furthermore, selectively exciting and 
inhibiting neighbouring GABA (y-aminobutyric acid) neurons in 
the ventral tegmental area reveals that these neurons are a source of 
subtraction: they inhibit dopamine neurons when reward is 
expected, causally contributing to prediction-error calculations. 
Finally, bilaterally stimulating ventral tegmental area GABA neu- 
rons dramatically reduces anticipatory licking to conditioned 
odours, consistent with an important role for these neurons in 
reinforcement learning. Together, our results uncover the arith- 
metic and local circuitry underlying dopamine prediction errors. 

Associative learning depends on comparing predictions with out- 
comes**, When outcomes match predictions, learning is not required. 
When outcomes violate predictions, animals must update their pre- 
dictions to reflect experience. Dopamine neurons are thought to pro- 
mote this process by encoding reward prediction error, or the 
difference between the reward an animal receives and the reward it 
expected to receive’ (see Supplementary Information). 

Despite extensive study, how dopamine neurons calculate predic- 
tion error remains largely unknown. Theories of reinforcement learn- 
ing predict that dopamine neurons perform subtraction, simply 
calculating actual reward minus predicted reward (or, in temporal 
difference models, the value of the current state minus the value of 
the previous state)’. However, dopamine neurons could also perform 
division, an equally fundamental and arguably more common neural 
computation’. The arithmetic underlying prediction errors has never 
been investigated. 

To probe how dopamine neurons calculate prediction error, we 
recorded from the ventral tegmental area (VTA) (Extended Data 
Figs la and 2a-c) while mice (n = 5) performed a classical condition- 
ing task with two interleaved trial types (Fig. 1a). In roughly half of the 
trials, we delivered reward unexpectedly, in the absence of any cue. In 
these trials, both the timing and size of reward were unexpected. In the 
other half of the trials, an odour cue predicted the timing of reward, but 
the size was still unexpected. By comparing responses to these two trial 
types, we could determine how temporal expectation modulates indi- 
vidual dopamine neurons across a range of firing rates. The light-gated 
ion channel, channelrhodopsin (ChR2), was expressed selectively in 
dopamine neurons, enabling us to identify neurons as dopaminergic 
on the basis of their responses to light® (Extended Data Fig. 3a-g). 

Consistent with previous results*’, dopamine neurons increased 
their responses with increasing reward size (example neuron in 
Extended Data Fig. 4a). Much like sensory neurons in response to 
stimuli of increasing intensity, dopamine neurons showed a gradual, 


monotonic response, well-fit by a saturating Hill function (orange 
trace in Fig. 1c; note that VTA GABA neurons do not show the same 
monotonic response, Extended Data Fig. 5). 

When reward was temporally expected, the responses of dopamine 
neurons were suppressed (P<0.001, t-test; example neuron in 
Extended Data Fig. 4a; population in Fig. 1b). To determine the nature 
of the suppression, we performed two complementary analyses. First, 
we fitted dopamine responses both with subtractive and with divisive 
models (Extended Data Fig. 6a). We found that subtraction was a 
significantly better fit (P< 0.001, bootstrap; Fig. 1c and Extended 
Data Fig. 4b). Second, we plotted the effect of temporal expectation 
across reward sizes and measured the slope. A divisive process would 
produce a positive slope, as division should have a larger effect on larger 
dopamine responses. In contrast, subtraction would produce a slope 
near zero. We found the latter; regardless of reward size, the odour cue 
simply shifted the dose-response curve by a constant amount 
(P> 0.05, linear regression, Fig. 1d). This subtractive pattern held 
not just for the population, but also for 35 out of 40 individual neurons 
(Extended Data Fig. 4c). Thus, consistent with classic theories of rein- 
forcement learning, dopamine neurons appear to be performing sub- 
traction (specifically, output subtraction®; see Extended Data Fig. 6a). 

Having established the computation, we next wished to determine the 
input that dopamine neurons subtract. A variety of biological models 
have been proposed to explain the neural circuit required to calculate 
prediction errors. Some of these models have situated the calculation at 
the level of the dopamine neurons””®, while others have suggested that 
the calculation happens upstream, for instance in the lateral habe- 
nula’”, and is then relayed to dopamine neurons. Recently, we demon- 
strated that GABAergic neurons in the VTA encode reward expectation, 
showing sustained responses that vary with the timing and size of 
expected reward®. Although these neurons are known to synapse onto 
nearby dopamine neurons” and appear to play a role in conditioned 
behaviour'*"*, there has been no direct evidence that dopamine neurons 
use the VITA GABA signal for prediction-error calculations. 
Furthermore, although some models of prediction-error calculations 
call for a ramping expectation function®'®’’, which resembles VTA 
GABA activity, others call for phasic, precisely timed expectation sig- 
nals'*°. Our study allows us to distinguish between these possibilities. 

Since we know the normal firing patterns of VTA GABA neurons 
during classical conditioning®, our strategy was to mimic this firing 
and determine whether it induces subtraction of dopamine neuron 
responses. In a separate set of mice (n = 5), ChR2 was expressed selec- 
tively in VTA GABA neurons, enabling us to stimulate these neurons 
while recording from putative dopamine neurons (Extended Data Figs 
1b and 2d-f). Much like the previous task, we unexpectedly delivered 
rewards of various size (Fig. 2a). In half of the trials, reward was 
delivered alone; in the other half, reward was delivered during 40 Hz 
VTA GABA stimulation. 

First we confirmed that ChR2 stimulation efficiently excited VTA 
GABA neurons (P < 0.001, paired t-test), adding about ten spikes per 
second to the baseline firing rate of the neurons (example neuron 
in Extended Data Fig. 4d; population in Fig. 2b). This laser-evoked 
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Figure 1 | Expectation triggers subtraction of dopamine neuron responses. 
a, Dopamine (DA) identification recording setup (left) and task (right). ITI, 
inter-trial interval. b, Dopamine neuron firing rates (mean + s.e.m. across 
neurons) for unexpected (orange) or temporally expected (black) reward. 
***P < 0.001, t-test. c. Dopamine neuron responses (mean + s.e.m.) to 
different reward sizes. Orange line, fit for unexpected reward. Dotted black line, 
divisive transformation. Solid black line, subtractive transformation. 
Subtraction was a better fit (***P < 0.001, bootstrap; see Methods and 
Extended Data Fig. 6e). d, Difference between unexpected and expected reward 
responses (mean + s.e.m.) as a function of reward size. 


activity roughly resembled the normal activity of VTA GABA neurons 
during classical conditioning (Extended Data Fig. 2f). 

Next we assessed how VTA GABA stimulation affected putative 
dopamine neuron responses to reward. As expected, GABA stimu- 
lation significantly suppressed dopamine reward responses 
(P< 0.001, t-test; example neuron in Extended Data Fig. 4e; popu- 
lation in Fig. 2c). This suppression could not be fully explained by a 
shift in baseline activity (Extended Data Fig. 7a—d). Moreover, the 
dopamine suppression was not due to an association between blue 
light and reward, as laser delivery failed to elicit expectation-related 
licking behaviour (Extended Data Fig. 8b). Indeed, a separate group of 
control mice (n = 2) expressing green fluorescent protein (GFP) rather 
than ChR2 in GABA neurons (Extended Data Fig. 2g-i) showed no 
effect of laser stimulation (P = 0.78, Fig. 2e, f). 

We confirmed that stimulating VTA GABA neurons suppresses 
phasic dopamine activity, but what is the shape of this suppression? 
As in our previous experiment, we determined a dopamine dose- 
response curve and fitted both subtractive and divisive models. We found 
that the effect of VTA GABA stimulation was subtractive (P< 0.05, 
bootstrap; Fig. 2d and Extended Data Fig. 6f, g). This subtractive effect 
held even when correcting for the baseline-lowering effect of GABA 


stimulation (P< 0.05; see Methods and Extended Data Fig. 6h, i). We 
conclude that VTA GABA activation mimics the effect of temporal 
expectation on putative dopamine neurons. 

Although we show that VTA GABA activity can account for 
expectation-like changes in dopamine responses, this does not dem- 
onstrate that VTA GABA neurons normally play such a role. To 
strengthen the causal link between VTA GABA activity and dopamine 
prediction-error coding, we inhibited VTA GABA neurons during 
their normal period of activity and asked whether this disrupts dopa- 
mine prediction errors. In a separate group of mice (n = 7), the light- 
sensitive inhibitory proton pump archaerhodopsin (ArchT)?' was 
expressed selectively in VTA GABA neurons (Extended Data Figs 
1c, e-g and 9a-c). Mice were trained in a two-odour classical condi- 
tioning task, in which odour A predicted reward with 10% probability 
and odour B predicted reward with 90% probability (Fig. 3a). In 25% of 
the trials, we delivered green laser to activate ArchT and inhibit VTA 
GABA neurons for 1 s around reward outcome. 

We first confirmed that laser stimulation significantly suppressed 
expectation-related activity in putative VITA GABA neurons 
(P = 0.001, t-test; individual neurons in Extended Data Fig. 4f, h; 
population in Fig. 3b). Next, we assessed how inhibiting VTA 
GABA neurons modified dopamine activity. Normally, putative dopa- 
mine neurons had reduced reward responses when a cue predicted 
reward delivery (P< 0.001, paired f-test; Fig. 3c). Inhibiting VTA 
GABA neurons partially reversed this expectation-dependent reduc- 
tion (individual dopamine neurons in Extended Data Fig. 4g, i; popu- 
lation in Fig. 3d). Thus, when VTA GABA neurons are inhibited, 
dopamine neurons respond as if reward is less expected. This change 
was specific to phasic reward responses, and not due solely to a shift in 
baseline activity (Extended Data Figs 7e-h and 10). Combined with 
our ChR2 experiment, these results suggest that VTA GABA neurons 
play a causal role in dopamine prediction-error coding. In particular, 
they help provide the burst-cancelling expectation signal long antici- 
pated by models of reinforcement learning'*’*”. 

In Figs 2 and 3 we report that VTA GABA excitation and inhibition 
modulates dopamine prediction-error responses. However, our uni- 
lateral optogenetic manipulation did not modify mouse behaviour 
(Extended Data Fig 8b, c). To determine whether the VITA GABA 
expectation signal is important for learning, we designed an additional 
experiment with bilateral manipulation. In a separate group of mice 
(n = 6), ChR2 was expressed selectively in VTA GABA neurons bilat- 
erally. The mice performed a four-odour classical conditioning task, in 
which odour A was associated with large reward, odours B and D were 
associated with small reward, and odour C was associated with no 
reward (Fig. 4a). After training, odour D trials were paired with 
VTA GABA stimulation. Importantly, the odour-reward associations 


Reward only . = 30 Putative 4A é © 207 putative QA é 
Laser 2 | DA j DA GFP) 
! oxul ' @8 \ 1 
Reward! — 89 5 4 
LW a ae 1 od f 20-44 
Oe £o i n=19 
Laser yw J rg 
ChR2 — + a an -* . 4 
Hewes a iO | T T 1 0 T I T 1 
\e—_>'« —__3 
ITI 1s -1 -0.5 (0) 0.5 1 -1 -0.5 0 0.5 1 
Time from reward onset (s) Time from reward onset (s) 
b 40 d 10- 


Putative © é 
GABA 


Putative 
DA 


15 Putative 


z oc DA 
o 3 o 8 
£2o = 
60 g 8 54 y* 10 
ag aS | AO 8 ete =" VF Aeware e Reward 
i=") , ® w Reward + 5 Reward + 
“2 ‘ «g © GABA stim. eet 
& n=38 ' _s Howard ey 0 == Division <7 Subtract 
0 ; aus — Subtraction 4 aa SU acren 
T T T T 1 I T T 
a1 0.5 0) 05 1 oO 5 10 0) 5 10 


Time from reward onset (s) 


Figure 2 | Selective excitation of VTA GABA neurons mimics the effect of 
expectation. a, GABA stimulation (stim.) recording setup (left) and task 
(right). b, Firing rate (mean + s.e.m.) of putative VTA GABA neurons with 
(blue) and without (black) ChR2 stimulation. Light blue box, laser delivery. 
c, Firing rate (mean = s.e.m.) of putative dopamine neurons. ***P < 0.001, 
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t-test. d, Dopamine neuron responses (mean + s.e.m.) to different reward 
sizes. Black line, fit for unexpected reward. Dotted blue line, divisive 
transformation. Solid blue line, subtractive transformation. Subtraction 
was a better fit (*P < 0.05, bootstrap; see Extended Data Fig. 6g). e, f, Same 
as c and d, respectively, but in GFP-expressing control animals. 
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Figure 3 | Selective inhibition of VTA GABA neurons modulates prediction 


errors. a, GABA inhibition (inhib.) recording setup (left) and task (right). 
b, Firing rate (mean ~ s.e.m.) of putative VTA GABA neurons during odour B 
trials with (green) or without (black) laser delivery. ***P < 0.001, paired t-test. 
c, Firing rate (mean + s.e.m.) of putative dopamine neurons when reward 
was delivered after odour A (orange) or odour B (black). ***P < 0.001, paired 
t-test. d, Same as b, but for putative dopamine neurons. ***P < 0.001, paired 
t-test. 


always remained the same. Our hypothesis was that over time, laser 
stimulation would reduce dopamine prediction-error responses for 
odour D. As a result, the expected value of odour D should decrease, 
and mice should lick less for odour D compared with odour B, even 
though the reward was the same. Indeed, this is what we found: after 
the laser was introduced, mice licked significantly less for odour D than 
for odour B (P < 0.001, laser X odour interaction, mixed effects linear 
model, Fig. 4b and Extended Data Fig. 8d). This reduction did not 
occur in a separate group of control mice (n = 6) that did not express 
ChR2 (Extended Data Fig. 8e). Although there was probably a direct 
effect of GABA stimulation on licking behaviour, as previously dis- 
covered", this cannot account for the entire difference, because the 
reduction remained significant on probe trials, where odour D was not 
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Figure 4 | Bilateral excitation of VTA GABA neurons disrupts learned 
association. a, Schematic of optogenetic setup (left) and behavioural task 
(right). b, For a representative mouse (one of six mice injected with ChR2), 
anticipatory licks during each session (mean ~ s.e.m. across trials) for odours A 
(black), B (dark grey), C (light grey), and D (blue). For sessions 12-17 (pale 
yellow), odour D was paired with laser. ***P < 0.001, laser X odour 
interaction, mixed effects model. c, Ratio of anticipatory licks for odour D 
versus odour B during laser sessions. Circles, mice injected with ChR2 (blue) or 
GFP (yellow). Open circles, probe trials, where laser was omitted after odour D. 
*P < 0.05; ***P < 0.001; Wilcoxon rank-sum test. 
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paired with laser stimulation (Fig. 4c). In other words, previous laser 
trials caused the mice to learn a new, reduced value for odour D, which 
persisted even in the absence of the laser. In the prediction-error frame- 
work, this new value may have been learned through GABA-induced 
dips in dopamine firing (see Supplementary Information). Consistent 
with our physiology results, our behavioural findings imply an import- 
ant role for VTA GABA neurons in prediction-error learning. 

Our study provides the first direct evidence for the arithmetic of 
dopamine prediction errors. Subtraction is an ideal process for 
prediction-error coding because it maintains a faithful separation 
between expected and unexpected rewards, even at the extremes of 
reward size (Extended Data Fig. 6a). Indeed, most, if not all, models of 
reinforcement learning have used subtraction to compute prediction 
error. However, although cortical pyramidal neurons appear capable 
of subtracting GABA input****, and modelling studies have explored 
the biophysics of this process***’, surprisingly few examples of sub- 
traction have been observed in natural settings in vivo”’*°. Our finding 
that reward expectation reduces dopamine reward responses in a 
purely subtractive manner sheds light on how such a computation 
can emerge from a network of neurons, and may provide a framework 
for other prediction-related processes in the brain. 


Online Content Methods, along with any additional Extended Data display items 
and Source Data, are available in the online version of the paper; references unique 
to these sections appear only in the online paper. 
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METHODS 


Animals. We used 33 adult male mice, backcrossed for more than 5 generations 
with C57/BL6] mice, that were heterozygous for Cre recombinase under the 
control of either the DAT gene (B6.SJL-Sle6az'"! er) Benny The Jackson 
Laboratory)*' or the Vgat gene (Vgat-ires-Cre)*”. We used 5 animals in the 
dopamine-identification task (Fig. 1), 7 in the GABA stimulation task (Fig. 2), 9 
in the GABA inhibition task (Fig. 3), and 12 in the behavioural experiment (Fig. 4). 
Animals were housed ona 12 h dark/12 h light cycle (dark from 7:00 to 19:00) and 
performed the task at the same time each day. In the behavioural experiment, 
animals were randomly assigned to either the experimental or control groups, and 
the experimenters were blinded to the assignment during all surgeries, behavioural 
sessions, and individual mouse analyses. All procedures were approved by the 
Harvard University Institutional Animal Care and Use Committee. 

Surgery and viral injections. All surgeries were performed under aseptic condi- 
tions with animals under either ketamine/medetomidine (60 and 0.5mgkg', 
intraperitoneal, respectively) or isoflurane (1-2% at 0.5-1.0 min’) anaesthesia. 
Analgaesia (ketoprofen, 5 mg kg” ' intraperitoneally; buprenorphine, 0.1 mg kg, 
intraperitoneally) was administered postoperatively. For the recording experi- 
ments, mice underwent two surgeries, both stereotactically targeting left VTA 
(from bregma: 3.0mm posterior, 0.8mm lateral, 4-5 mm ventral). In the first 
surgery, we injected 200-500 nl adeno-associated virus (AAV) to enable cell-type 
identification or manipulation (see below). After 2-4 weeks, we performed a 
second surgery to implant a head plate and microdrive containing six to eight 
tetrodes and an optical fibre, as described®. Recording sites are displayed in 
Extended Data Fig. 1. For the behavioural experiment, mice underwent a single 
surgery in which we injected 500 nl AAV into VTA bilaterally, and then implanted 
a headplate and a dual-optic fibre cannula (300 j1m diameter, Doric Lenses) cus- 
tom-designed to target bilateral VTA. 

The viral injections differed among the four experiments. In the dopamine 

identification experiment (Fig. 1), we injected AAV (serotype 5) carrying an 
inverted ChR2 (H134R) fused to the fluorescent reporter enhanced yellow fluor- 
escent protein (eYFP) and flanked by double loxP sites****. We previously showed 
that expression of this virus in dopamine neurons is highly selective and efficient’. 
In both the GABA stimulation experiment (Fig. 2) and the behavioural experiment 
(Fig. 4), we injected the same AAV-FLEX-ChR2-eYFP construct or, for control 
mice, we injected AAV5-GFP (University of North Carolina Vector Core). 
Finally, in the GABA inhibition experiment (Fig. 3), we injected AAV (serotype 
1 or 8) carrying an inverted ArchT”' fused to the fluorescent reporter GFP and 
flanked by double loxP sites (University of North Carolina Vector Core). 
Expression of ArchT was almost 100% selective to GABA neurons and about 
50% efficient, for both AAV1 and AAV8 (Extended Data Fig. 1e-g). In both the 
ChR2 and ArchT experiments, no virus-expressing cell bodies were observed 
distant from the injection site (for example, in the striatum or the cortex), implying 
that the virus was not taken up by axons in the VTA and transported retrogradely 
to input areas. 
Behavioural tasks. After more than 1 week of recovery, mice were water-restricted 
in their cages. Weight was maintained above 90% of baseline body weight. Animals 
were head-restrained and habituated for 1-2 days before training. Odours were 
delivered with a custom-made olfactometer”’. Each odour was dissolved in min- 
eral oil at 1/10 or 1/100 dilution. Thirty microlitres of diluted odour was placed 
inside a filter-paper housing, and then further diluted with filtered air by 1:20 to 
produce a 1,000mlmin™’ total flow rate. Odours included isoamyl acetate, 
(+)-carvone, 1-hexanol, p-cymene, ethyl butyrate, and 1-butanol, and differed 
for different animals. In the recording experiments, licks were detected by breaks 
of an infrared beam placed in front of the water tube. In the behavioural experi- 
ments, licks were detected by contact with a water tube connected to a capacitative 
sensing circuit (Teensy, PJRC). 

Each trial began with 1 s odour delivery, followed by a delay (either 0.5s or 1s), 
and a reward outcome. In the dopamine identification experiment (Fig. 1), the 
outcome ranged from 0.1 pl to 20 pl water; in the GABA stimulation experiment 
(Fig. 2), the outcome ranged from 0.3 pl to 10 pl water; in the GABA inhibition 
experiment (Fig. 3), the outcome was either 0 pl or 3.75 il water; and in the 
behavioural experiment (Fig. 4), the outcome was 0, 2, or 5 il water. Inter-trial 
intervals were drawn from an exponential distribution (mean 7.6 s), resulting in a 
flat hazard function such that mice had constant expectation of when the next trial 
would begin. The tasks were purely classical conditioning: the behaviour of the 
mice had no effect on the outcomes. Animals performed between 300 and 700 
trials per session. 

The dopamine identification experiment (Fig. 1) included three trial types, ran- 
domly intermixed. In trial type 1 (45% of all trials), an odour was delivered for 1 s, 
followed by a 0.5 s delay and a reward chosen pseudo-randomly from the following 
set: 0.1, 0.3, 1.2, 2.5, 5, 10, or 20 pil. The frequency of each reward size was chosen to 
make the average reward approximately 5 ul. Reward sizes were determined by the 
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length of time the water valve remained open: 4, 12, 25, 45, 75, 140, or 250 ms, 
respectively. In trial type 2 (45% of all trials), rewards of various sizes were delivered 
without any preceding odour. The reward sizes were identical to trial type 1. In 
these trials, the reward itself was considered the start of the trial, to ensure a flat 
hazard function. Comparing trial types 1 and 2 allowed us to determine 
how a constant level of expectation modulated responses to different sizes of 
reward. In trial type 3 (10% of all trials), a different odour was delivered, which 
was followed by no outcome. This trial type was included to ensure that the animals 
learned the task: they began to lick after the odour in trial type 1 but not after the 
odour in trial type 3 (Extended Data Fig. 8a). 

The GABA stimulation experiment (Fig. 2) mimicked the dopamine identifica- 
tion experiment, but instead of delivering a reward-predicting odour, we used a 
blue laser to activate VTA GABA neurons directly. The experiment included three 
randomly interleaved trial types. In trial type 1 (5% of trials), rewards were deliv- 
ered unexpectedly, in the absence of laser stimulation. Reward sizes were chosen 
pseudo-randomly from the following set: 0.3, 1.2, 2.5, 5, or 10 ul. Each reward size 
was equally frequent. In trial type 2 (5% of trials), rewards were also delivered 
unexpectedly, but now in the presence of laser stimulation. The laser was delivered 
at 40 Hz for a total of 1 s, and reward was delivered in the middle of this period. In 
trial type 3 (90% of trials), laser was delivered at 40 Hz for a total of 1s, but no 
reward was delivered. The reason for the prevalence of trial type 3 was to ensure 
that mice did not associate the laser (which they might have seen, despite attempts 
to mask the light by painting the fibre black) with reward delivery. 

In the GABA inhibition experiment (Fig. 3), each trial began with one of two 
odours, selected pseudo-randomly. One odour predicted water reward with 10% 
probability and the other odour predicted water reward with 90% probability. In 
25% of these trials, 1 s of continuous green laser was administered, beginning at 
odour offset and lasting until 0.5 s after reward was delivered. This encompassed 
both the delay between odour and reward (1-1.5 s) and the reward response period 
(1.5-2 s), which are the times in which VTA GABA neurons normally fire’. Laser 
stimulation did not affect licking behaviour (Extended Data Fig. 8c). At the begin- 
ning and end of each recording session, we delivered 1s periods of green laser 
without any odours or rewards, to assess how GABA inhibition modulated dopa- 
mine baseline activity. 

The behavioural experiment (Fig. 4) included four trial types, each associated 
with a different odour. The four trial types were pseudo-randomly interleaved and 
equally likely. Odour A was associated with big reward (5 11), odours B and D were 
associated with small reward (2 ul), and odour C was associated with no reward. 
After training, when the mice consistently associated the odours with reward (as 
demonstrated by their anticipatory licking behaviour), blue laser was paired with 
odour D trials. Laser was delivered for 2.5 s, beginning 0.5 s after odour onset and 
ending 0.5 s after reward onset. The intensity of light was modulated in a ramping 
fashion (see below). After six to eight sessions using the laser, the laser was turned 
off for the remaining four or five sessions, allowing us to examine whether the 
effect of laser stimulation would persist even in the absence of laser. Additionally, 
to clarify whether behaviour changes reflected learning or a direct effect of VTA 
GABA stimulation on licking, we included probe trials in the final two or three 
laser sessions. During these probe sessions, 10% of odour B trials randomly 
received laser stimulation, and 10% of odour D trials randomly omitted the laser. 
Electrophysiology. Recording techniques were based ona previous study’*. Briefly, 
we recorded extracellularly from VTA using a custom-built, screw-driven micro- 
drive containing six or eight tetrodes (Sandvik) glued to a 200m optic fibre 
(ThorLabs). Tetrodes were affixed to the fibre so that their tips extended 300- 
600 um from the end of the fibre. Neural and behavioural signals were recorded 
with a DigiLynx recording system (Neuralynx) or a custom-built system using a 
multi-channel amplifier chip (RHA2116, Intan Technologies) and data acquisi- 
tion device (PCIe-6351, National Instruments). Broadband signals from each wire 
were filtered between 0.1 and 9,000 Hz and recorded continuously at 32 kHz. To 
extract spike timing, signals were band-pass-filtered between 300 and 6,000 Hz 
and sorted offline using SpikeSort3D (Neuralynx) or MClust-3.5 (A. D. Redish). 
At the end of each session, the fibre and tetrodes were lowered by 40-80 |tm to 
record new units the next day. 

To be included in the data set, a neuron had to be well isolated (L-ratio (ref. 
36) < 0.05) and recorded within 0.5 mm ofa light-identified or putative dopamine 
neuron, to ensure that it was recorded in VTA. Recording sites were also verified 
histologically with electrolytic lesions using 10-15 s of 30 pA direct current. 
Laser delivery. To identify neurons as dopaminergic or GABAergic, we used 
ChR2 to observe laser-triggered spikes**”**. The optical fibre was coupled with a 
diode-pumped solid-state laser with analogue amplitude modulation (Laserglow 
Technologies). At the beginning and end of each recording session, we delivered 
trains of ten blue (473 nm) light pulses, each 5 ms long, at 1, 10, 20 and 50 Hz, with 
an intensity of 5-20 mW mm ” at the tip of the fibre. Spike shape was measured 
using a broadband signal (0.1-9,000 Hz) sampled at 32 kHz. 
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In the GABA stimulation experiment (Fig. 2), we used the same blue laser to 
deliver 40 pulses (duration 5 ms, 40 Hz) during selected trials. In the GABA inhibi- 
tion experiment (Fig. 3), we used one of two methods of laser delivery. For seven 
mice (Extended Data Fig. 9a-c), we used an electronic shutter (Vincent Associates) 
to deliver 1 s intervals of continuous green laser (532 nm, Laserglow Technologies), 
with an intensity of ~50 mW mm ~ at the tip of the fibre. For a separate group of 
two mice (Extended Data Figs 9d-fand 10), we instead modulated laser intensity in 
an analogue fashion, beginning at zero intensity 0.5 s after odour onset, smoothly 
increasing intensity toa peak of 50 mW mm ” at reward delivery, and then gradu- 
ally decreasing over the next 0.5s. This ramping protocol was also used for the 
behavioural experiment (Fig. 4), with a 473 nm laser (OptoEngine) and beam 
splitter (Doric Lenses) to deliver blue light bilaterally. The ramping intensity 
profile was chosen to approximate the response pattern of VTA GABA neurons’. 
Data analysis. Peristimulus time histograms were constructed using 1 ms bins 
and then convolved with a function resembling a postsynaptic potential, 
(1 — exp(—?)) X (exp(—t/20)), for time t in milliseconds. Average firing rates in 
response to reward were calculated using a 600 ms window after reward onset for 
the dopamine identification and GABA stimulation experiments, and a 500 ms 
window after reward onset for the GABA inhibition experiment. These windows 
were chosen to reflect the full duration of the neural response to reward. Window 
sizes ranging from 300 to 1,000 ms were attempted and gave qualitatively similar 
results. To calculate reward response, we subtracted baseline firing (averaged over 
1s before trial onset). Calculating the baseline using different windows (for 
example, 600 ms before reward onset) did not change the results. To ensure 
reliability, analyses of particular trial types only included neurons that were 
recorded during at least five presentations of that trial type. 

To identify neurons as dopaminergic or GABAergic, we used the stimulus- 
associated spike latency test*® to determine whether light pulses significantly 
changed a neuron’s spike timing (Extended Data Fig. 3). We used a significance 
value of P< 0.001. To ensure that spike sorting was not contaminated by light 
artefacts, we also calculated waveform correlations between spontaneous and 
light-evoked spikes, as described’. All light-identified neurons had Pearson’s cor- 
relation coefficients > 0.9. 

In all three recording experiments, we identified putative dopamine and GABA 
neurons on the basis of their firing patterns through an unsupervised clustering 
approach (Extended Data Figs 2 and 9), similar to a previous study®. Briefly, 
receiver-operating characteristic (ROC) curves for each neuron were calculated 
by comparing the distribution of firing rates across trials in 100 ms bins (starting 
1s before expected reward and ending 1s after expected reward) to the distri- 
bution of baseline firing rates (1 s before trial onset). Principal component analysis 
was calculated using the singular value decomposition of the area under the ROC. 
Hierarchical clustering was then done using the first three principal components 
of the area under the ROC using a Euclidean distance metric and complete 
agglomeration method. 

As described’, this method produced three clusters: one with phasic excitation 
to reward (type 1), one with sustained excitation to reward expectation (type 2), 
and one with sustained suppression to reward expectation (type 3). Type 1 neu- 
rons were classified as putatively dopaminergic. Forty out of 43 light-identified 
dopamine neurons fell into this cluster; the other three light-identified dopamine 
neurons showed phasic suppression to reward and were clustered as type 3. Since 
these three dopamine neurons showed qualitatively different responses than the 
others, they were not included in the data set. Note that although we focus on 
identified dopamine neruons, our main findings are identical if we include all 
putative dopamine neurons (Extended Data Fig. 6b, c). 

Type 2 neurons were classified as putatively GABAergic. Of 14 identified GABA 
neurons, 11 were clustered as type 2; the other three were inhibited by reward and 
were clustered as type 3. Again, these three GABA neurons were not included in 
the data set. Unlike type 1 neurons, type 2 neurons did not respond to either 
expected or unexpected reward in a consistently size-dependent fashion 
(Extended Data Fig. 5). This contrasts with their delay activity, which increases 
with increasing reward expectation’. 

The distribution of neurons across mice for all recording experiments is pro- 
vided in Supplementary Table 1. 

To determine the dose-response of dopamine neurons and see whether 
expectation caused a subtractive or divisive effect (Fig. 1c), we based our analysis 
on a previous study*’. We first fitted a hyperbolic ratio function (Hill function) to 
the unexpected reward data: 


f° =fows (sea) aa) 


705 oe 5 


The function had two free parameters: finax» the saturating firing rate; and o, the 
reward size that elicits half-maximum firing rate. We chose an exponent of 0.5 


after fitting the data with exponents ranging from 0.1 to 2.0 (in steps of 0.1), and 
finding the exponent with the lowest mean squared error. Note that the Hill 
function is not the only possible function that could fit our data. For example, 
the power function f(r) = ar*, where a = 3.73 and k = 0.39, was also excellent. 
However, this function does not saturate, so we thought it was less likely to 
represent neuronal responses. Our conclusions do not depend on the exact func- 
tion chosen to fit the data. 

After fitting the unexpected reward data, we explored what simple transforma- 
tion could best mimic the effect of expectation. We tested four options: input 
subtraction, input division, output subtraction, and output division (Extended 
Data Fig. 6a). Specifically, we evaluated the following four models”: 
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In each case, we used the finax and o values determined by the unexpected 
reward data. The only new parameter was the expectation factor E, which we fitted 
separately for each of the four models. Output subtraction consistently gave the 
best fit (lowest mean squared error), for the population and most individual 
neurons. The next best model was generally output division. We statistically 
compared model-fits using a bootstrapping analysis: we resampled the data 
1,000 times and determined for each resample the mean squared error for both 
output subtraction and output division. We calculated the P value by counting the 
number of resamples when the mean squared error was better for output division 
than for output subtraction (for example, if 1 resample out of 1,000 preferred 
output division over output subtraction, P = 0.001; Extended Data Fig. 6c). 
These steps were repeated for putative dopamine neurons in the GABA stimu- 
lation experiment (Fig. 2d). 

As acomplementary analysis to determine whether expectation had a subtract- 
ive or divisive effect on dopamine reward responses, we calculated the difference 
between unexpected and expected reward responses for different reward sizes 
(Fig. 1d). We then ran a linear regression to determine whether the slope of 
this difference was significantly different from zero. A slope of zero would be 
consistent with output subtraction, as expectation would have the same effect 
on all responses. A slope greater than zero would be consistent with output 
division, as expectation would have a larger effect on larger responses. All but five 
of the light-identified dopamine neurons had a slope no different from zero 
(Extended Data Fig. 4c). 

In our GABA stimulation and inhibition experiments, we wanted to ensure 
that laser delivery affected phasic dopamine responses in addition to shifting 
baseline dopamine activity. First, we identified putative dopamine neurons that 
did not significantly change their baseline firing upon laser delivery. To do so, we 
compared firing rates in the 0.5 s before reward delivery on laser trials versus no- 
laser trials. Neurons with P>0.05 (Wilcoxon rank-sum) were identified as 
unaffected by laser delivery. In both the GABA stimulation and GABA inhibition 
experiments, these neurons continued to be affected at the time of reward 
(Extended Data Fig. 7a, e). Second, we recorded from putative dopamine neu- 
rons while manipulating VTA GABA activity outside the task (Extended Data 
Fig. 7c, g). This gave us an unbiased sense of how VTA GABA stimulation or 
inhibition affected dopamine baseline responses. We then subtracted these laser- 
alone trials from trials where laser was delivered during reward (Extended Data 
Fig. 7b, f). Any remaining change at the time of reward should not be due to a 
baseline shift. 

Interestingly, the baseline shift may have been an artefact of the type of laser 
stimulation we applied. In a separate experiment (n = 2 mice, Extended Data Fig. 
9d-f), we applied the laser so that light intensity would ramp up rather than 
remain constant over the course of a trial, more closely mimicking the physio- 
logical responses of VTA GABA neurons. We found that this ramping stimulation 
successfully inhibited putative VTA GABA neurons (P = 0.001, t-test, Extended 
Data Fig. 10a, b) and increased reward responses in putative dopamine neurons 
(P< 0.001, t-test, Extended Data Fig. 10c, d) without causing a baseline shift. 
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To assess how well VTA GABA stimulation mimics odour expectation, we also 
directly compared the magnitude of change in dopamine responses in both experi- 
ments. In the odour-based experiment (Fig. 1), the average suppression of dopa- 
mine reward responses was 52.7%, compared with 43.5% for the VTA GABA 
stimulation experiment (Fig. 2; P< 0.05, t-test). This difference may be accounted 
for by variation among putative dopamine neurons in their response to laser. 
Although 40 out of 45 putative dopamine neurons were suppressed by GABA 
stimulation, 5 were activated, perhaps through disynaptic disinhibition, as VTA 
GABA neurons are known to synapse onto each other as well as onto dopamine 
neurons’’. In addition, there may be other neurons, besides VTA GABA neurons, 
that help suppress dopamine responses when reward is expected. 

Although we focus on changes in the magnitude of dopamine neuron responses, 
this was not the only effect of reward expectation. Notably, the latency to peak 
response was also extended, from an average of 67.6 ms to 95.4ms (P = 0.001, 
t-test). The latency increased in 37 out of 40 dopamine neurons that we recorded. 
The downstream consequences of this change in latency remain to be elucidated. 

Comparisons were performed with t-tests (for population data) or Wilcoxon 
rank-sum tests (for individual neuron data), with corrections for multiple com- 
parisons (Bonferroni or Tukey). Correlations were done with Pearson’s rho. P 
values less than 0.05 were considered significant, unless otherwise noted. Given 
pilot data showing effects of optogenetic manipulation of about 2 spikes per 
second, with variability of about 3 spikes per second, 36 neurons were required 
for 80% power to detect the effect. Given about ten neurons of each type per 
mouse, we aimed for at least four mice per experiment. Analyses were done with 
Matlab (Mathworks). 

In the behavioural experiment (Fig. 4), the strength of the learned association 
between each odour and reward was estimated by counting the number of 
anticipatory licks over the 2 s from odour onset to reward delivery. For the analysis 
in Fig. 4b and Extended Data Fig. 8d, e, we excluded data from probe trials. 
Population results were examined using a mixed-effects linear model. The fixed 
effects included trial type and a binary variable indicating whether the session 
included laser delivery. The random effect was mouse identity. The outcome of 
interest was an interaction between trial type and laser. Results were robust to 
different choices of window for counting anticipatory licks. 
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Immunohistochemistry. After recording for 4-8 weeks, mice were given an 
overdose of ketamine/medetomidine, exsanguinated with saline, and perfused 
with 4% paraformaldehyde. Brains were cut in 100m coronal sections on a 
vibrotome and immunostained with antibodies to tyrosine hydroxylase (AB152, 
1:1,000, Millipore) to visualize dopamine neurons and 49,6-diamidino-2-pheny- 
lindole (DAPI, Vectashield) to visualize nuclei. Virus expression was determined 
through eYFP fluorescence. Slides were examined to verify that the optic fibre 
track was among VTA dopamine neurons and in a region expressing the virus. For 
the GABA inhibition experiment, two Vgat-tdTomato mice were injected with 
AAV-FLEX-ArchT-GFP to determine the selectivity and efficiency of ArchT 
expression in VTA GABA neurons (Extended Data Fig. le-g). One mouse was 
injected with AAV serotype 1 and the other with AAV serotype 8. For the figure, 
brightness and contrast were adjusted in Adobe Photoshop. 
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Extended Data Figure 1 | Recording sites and ArchT expression. 

a-d, Schematic of recording locations for mice used in the dopamine 
identification task (a, n = 5), the GABA stimulation task (b, n = 7), the GABA 
inhibition task (c, n = 9), and the behavioural task (d, n = 12). b, Red, 
experimental mice expressing ChR2 in VTA GABA neurons (n = 5). Blue, 
control mice expressing GFP in VTA GABA neurons (n = 2). ¢, Red, mice in 
which laser was delivered at continuous intensity (n = 7). Blue, mice in 
which laser was delivered with ramping intensity (n = 2). d, Red, experimental 
mice expressing ChR2 in VTA GABA neurons (n = 6). Blue, control mice 
expressing GFP in VTA GABA neurons (n = 6). eg, Selectivity and efficiency 
of ArchT expression. e, Representative merged image (one of 30 z-stacks). 


f 400 
2 GAAV! 
fe) 
2 Cavs 
= 
§ Vgat- a 
tdTomato ~ 50 
e 
ru 
et <4 
S 
<x 
xe 
0 — 
ArchT- 
GFP Vgat-tdTomato + = 
a 2 100 
fe) 
5 
(3) 
, =f 
# 
2 
Merge = 50 | 
{e) 
ic 
2 
G 
io) 
ss 0 
ArchT-GFP + = 


Magenta, Vgat-tdTomato; green, ArchT-GFP. Open arrow, neuron expressing 
Vgat-tdTomato but not ArchT—GFP. Closed arrow, neuron expressing both 
Vgat-tdTomato and ArchT-GFP. Scale bar, 10 um. f, Selectivity of infection 
to GABA neurons: percentage of ArchT-GFP-expressing neurons (n = 131 
neurons for AAV1 and 165 neurons for AAV8) that were positive for 
Vgat-tdTomato. Filled bars, Vgat-tdTomato mouse injected with AAV1- 
FLEX-ArchT-GFP. Empty bars, Vgat-tdTomato mouse injected with 
AAV8-FLEX-ArchT-GFP. g, Efficiency of infection: percentage of 
Vgat-tdTomato-expressing neurons (” = 278 neurons for AAV1 and 

283 neurons for AAV8) that were positive for ArchT-GFP. 
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Extended Data Figure 2 | Neuron classification for dopamine identification Yellow, increase from baseline; cyan, decrease from baseline. Light-identified 


and GABA stimulation experiments. a—c, Dopamine identification neurons are denoted by an asterisk to the left of each column. b, e, h, The first 
experiment. d-f, ChR2-expressing animals in GABA stimulation experiment. three principal components of the area under the ROC curves. These values 
g-i, GFP-expressing control animals in GABA stimulation experiment. were used for unsupervised hierarchical clustering, as shown in the 

a, d, g, Responses of all VTA neurons recorded in the tasks. Each row reflects | dendrogram on the right. c, f, i, Average firing rates for the three clusters of 
the area under the ROC values for a single neuron in the second before and neurons in each task. Odour was delivered for 1 s, followed by a 0.5s delay 


after delivery of expected reward. Baseline is taken as 1s before odour onset. _and then reward delivery. 
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Extended Data Figure 3 | Light identification of dopamine and GABA 
neurons. a, Raw signal from one example light-identified dopamine neuron. 
Blue bars, light pulses. b, For the same neuron, mean waveforms for 
spontaneous (black) and light-evoked (blue) action potentials. c, For the same 
neuron, raster plots for 20 Hz (left) and 50 Hz (right) laser stimulation. Each 
row is one trial of laser stimulation. d, Histogram of log P values for each neuron 
recorded in the dopamine identification experiment (n = 170). The P values 
were derived from the stimulus-associated spike latency test (see Methods). 
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Neurons with P < 0.001 and waveform correlations > 0.9 were considered 
identified (filled bars). e, f, For light-identified neurons, probability of 


spiking (e) and latency to first spike (f) after laser pulses at different frequencies. 


Orange circles, mean across neurons. g, Histogram of mean latencies (left) 


and latency standard deviations (right) in response to laser stimulation for all 


light-identified dopamine neurons in the variable-reward task. h-n, Same 
conventions as a—g, but for neurons recorded in the GABA stimulation task 
(n = 102). 
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Extended Data Figure 4 | Individual neuron analysis from all recording neuron regression slopes for the analysis in Fig. 1d. Empty bars, slope not 
experiments. a-c, Results from dopamine identification experiment (Fig. 1). different from zero (P > 0.05). Filled bars, P< 0.05. Triangle, mean slope. 

d, e, Results from GABA stimulation experiment (Fig. 2). f-i, Results from d, e, Firing rate of representative VTA GABA (d) and putative dopamine (e) 
GABA inhibition experiment (Fig. 3). a, Raster plots (top and middle) and neuron with (blue) and without (black) ChR2 stimulation. Light blue box, laser 
firing rate (bottom) of representative dopamine neuron in response to delivery. f, g, Firing rate of representative VTA GABA (f) and putative 
unexpected (orange) or temporally expected (black) reward. ***P < 0.001, dopamine (g) neuron during odour B trials with (green) or without (black) laser 
t-test. b, For the same neuron, responses (mean + s.e.m. across trials) to each _ delivery. h, i, Histogram of putative GABA (h) and dopamine (i) neuron 
reward size. Orange line, fit for unexpected reward. Dotted black line, divisive responses to laser delivery. Filled bars, significant effect of laser (P < 0.05, 
transformation. Solid black line, subtractive transformation. c, Individual Wilcoxon rank-sum); empty bars, P > 0.05. Triangle, mean. 
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Extended Data Figure 5 | VTA GABA activity does not vary consistently 
with reward size. a—c, Putative GABA neurons in the dopamine identification 
experiment (Fig. 1). d-f, Putative GABA neurons in the GABA stimulation 
experiment (Fig. 2). a, b, Average firing rate of putative GABA neurons to 
unexpected (a) or temporally expected (b) rewards of various sizes. 

c, Population responses (mean + s.e.m. across putative GABA neurons) for 
different reward sizes. Orange, unexpected reward. Black, temporally expected 
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reward. Responses were averaged over a 600 ms window after reward delivery. 
d, e, Average firing rate of putative GABA neurons to rewards of various 
sizes, delivered with (e) or without (d) optogenetic GABA stimulation. 

f, Population responses (mean = s.e.m. across putative GABA neurons) for 
different reward sizes. Blue, reward with laser stimulation. Black, reward 
without laser stimulation. Responses were averaged over a 600 ms window after 
reward delivery. 
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Extended Data Figure 6 | Statistical test for subtraction versus division. 

a, To understand how dopamine neurons compute reward prediction error, we 
first determined how dopamine neurons respond to various sizes of unexpected 
reward (schematized as orange curves). We then taught the mice to expect 
reward and observed how expectation shifted this dose-response (black 
curves). We modelled four types of shift: output subtraction (top left), input 
subtraction (bottom left), output division (top right), and input division 
(bottom right). Output subtraction was consistently the best fit. For equations, 
see Methods. Analysis adapted from a previous study”. b-e, Results from 
dopamine identification experiment. f-i, Results from GABA stimulation 
experiment. b, c, Results from all putative dopamine neurons (n = 84). 

***P < (0.001, bootstrap. d, e, Results from light-identified dopamine neurons 


(n = 40). ***P < 0.001, bootstrap. f, g, Results from putative dopamine 
neurons in the GABA stimulation experiment (n = 45). *P < 0.05, bootstrap. 
h, i, Results from putative dopamine neurons in the GABA stimulation 
experiment, subtracting the 500 ms period immediately before reward delivery. 
This takes into account the laser-induced baseline shift in dopamine responses. 
*P < 0.05, bootstrap. b, d, f, h, Average responses (mean ~ s.e.m. across 
neurons) to different sizes of reward, with fits for output subtraction (solid line) 
and output division (dotted line). ¢, e, g, i, Results of bootstrapping analysis. 
For each resample, we compared the mean squared error for the subtractive fit 
with the mean squared error for the divisive fit. Negative numbers favour 
subtraction. P values were calculated as the proportion of resamples in which 
division was a better fit than subtraction. 
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Extended Data Figure 7 | Laser effect is more than a baseline shift. 

a-d, Results from GABA stimulation experiment. e-h, Results from GABA 
inhibition experiment. a, Firing rate (mean = s.e.m.) of putative dopamine 
neurons that did not show a significant baseline shift. ***P < 0.001, t-test. b, To 
visualize whether GABA stimulation preferentially affected phasic dopamine 
responses in addition to baseline firing rates, we took the activity in Fig. 2c and 
subtracted the trials when laser was delivered alone. Any remaining change at 
the time of reward could not be due to a baseline shift. **P = 0.01, t-test. 

c, Firing rate (mean + s.e.m.) of putative dopamine (left) and GABA (right) 
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neurons in trials where laser was delivered in the absence of reward. This 
dopamine response was subtracted to calculate the firing rates in 

b. d, Histogram of the phasic effect of GABA stimulation. The values were 
calculated by subtracting the black line from the blue line in b. Empty bars, 
slope not different from zero (P > 0.05, Wilcoxon rank-sum test). Filled bars, 
slope different from zero (P < 0.05). Triangle, mean (P < 0.001, t-test). 

e-h, Same conventions as a-d, but for the GABA inhibition experiment. 
***D < ().001, t-test. 
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Extended Data Figure 8 | Behavioural performance on all four experiments. inhibit VTA GABA neurons in 25% of reward (green) and nothing (orange) 
a, In the dopamine identification task (Fig. 1), lick rates (mean + s.e.m. across _ trials. d, e, In the bilateral stimulation experiment (Fig. 4), anticipatory licks 


sessions) for odours predicting reward (black) or nothing (grey). b, In the (mean + s.e.m. across mice) for mice injected with ChR2 (d) and GFP (e). Grey 
GABA stimulation task (Fig. 2), lick rates (mean + s.e.m. across sessions) for _ bars, odour B; blue or orange bars, odour D. Left, last three training sessions 
reward alone (black), reward + GABA stimulation (blue), or GABA before odour D was paired with laser; middle, last three sessions with laser 
stimulation alone (orange). ¢, In the GABA inhibition task (Fig. 3), lick rates delivery (excluding probe trials); right, last three sessions after laser was turned 
(mean + s.e.m. across sessions) for the odours predicting reward with 90% off. **P < 0.01; ***P < 0.001; paired t-test. 


probability (black) and 10% probability (grey). Green laser was delivered to 


©2015 Macmillan Publishers Limited. All rights reserved 


LETTER 


a auROC b PC ° 20 — Reward 
6 Nothing 
: 1 (Increase) es : 
83 Type 1 (n= 74 
0 (Decrease) oo 10 : ype't (n= 74) 
4 : 
eo : 
1 ir2 


50 


Type 2 (n = 107) 


Firing rate 
(spikes/s) 


100 


Neuron 


150 
Type 3 (n = 52) 


Firing rate 
(spikes/s) 


200 


Type 1 (n = 37) 


Firing rate 
(spikes/s) 
3S 


Type 2 (n = 31) 


Firing rate 
(spikes/s) 


Neuron 


NO 
oO 


a 
Type 3 (n = 17) 


Firing rate 
(spikes/s) 


= 
oO 


-1 0 1 
Time - reward onset (s) 


Time - odour onset (s) 


Extended Data Figure 9 | Neuron classification for GABA inhibition increase from baseline; cyan, decrease from baseline. b, e, The first three 
experiment. a-c, Mice in which laser was delivered with continuous intensity. _ principal components of the area under the ROC curves. These values were 
d-f, Mice in which laser was delivered with ramping intensity. a,d, Responses _ used for unsupervised hierarchical clustering, as shown in the dendrogram 
of all VTA neurons recorded in the tasks. Each row reflects the area under on the right. c, f, Average firing rates for the three clusters of neurons in each 
the ROC values for a single neuron in the second before and after delivery of task. Odour was delivered for 1 s, followed by a 0.5 delay and then reward 
expected reward. Baseline is taken as one second before odour onset. Yellow, delivery. 
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Extended Data Figure 10 | Ramping laser stimulation eliminates baseline 
shift. a, Firing rate (mean + s.e.m.) of putative VTA GABA neurons during 
odour B trials with (green) or without (black) ramping laser delivery. 

***P < 0.001, t-test. b, Histogram of putative GABA neuron responses to laser 
delivery. Responses were averaged over the entire duration of the laser. 

Filled bars, significant effect of laser (P < 0.05, Wilcoxon rank-sum test); empty 
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bars, P > 0.05. Triangle, mean (P < 0.001, t-test). c, Firing rate (mean + s.e.m.) 
of putative dopamine neurons with (green) or without (black) ramping GABA 
inhibition. ***P < 0.001, t-test. d, Histogram of putative dopamine neuron 
responses to laser delivery. Responses were averaged over the 0.5 s window 
after reward delivery. Filled bars, significant effect of laser (P < 0.05, Wilcoxon 
rank-sum test); empty bars, P > 0.05. Triangle, mean (P < 0.001, t-test). 
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Evidence for human transmission of amyloid-B 
pathology and cerebral amyloid angiopathy 


Zane Jaunmuktane’, Simon Mead”**, Matthew Ellis’, Jonathan D. F. Wadsworth??, Andrew J. Nicoll?*, Joanna Kenny”, 
Francesca Launchbury’, Jacqueline Linehan’, Angela Richard-Loendt*, A. Sarah Walker?, Peter Rudge’, 


John Collinge**:* & Sebastian Brandner)*? 


More than two hundred individuals developed Creutzfeldt-Jakob 
disease (CJD) worldwide as a result of treatment, typically in child- 
hood, with human cadaveric pituitary-derived growth hormone 
contaminated with prions’*. Although such treatment ceased in 
1985, iatrogenic CJD (iCJD) continues to emerge because of the 
prolonged incubation periods seen in human prion infections. 
Unexpectedly, in an autopsy study of eight individuals with 
iCJD, aged 36-51 years, in four we found moderate to severe grey 
matter and vascular amyloid-B (AB) pathology. The Af deposition 
in the grey matter was typical of that seen in Alzheimer’s disease 
and Af in the blood vessel walls was characteristic of cerebral 
amyloid angiopathy’ and did not co-localize with prion protein 
deposition. None of these patients had pathogenic mutations, 
APOE ¢4 or other high-risk alleles* associated with early-onset 
Alzheimer’s disease. Examination of a series of 116 patients with 
other prion diseases from a prospective observational cohort 
study* showed minimal or no Af pathology in cases of similar 
age range, or a decade older, without APOE ¢4 risk alleles. We also 
analysed pituitary glands from individuals with AB pathology and 
found marked Af deposition in multiple cases. Experimental seed- 
ing of AB pathology has been previously demonstrated in primates 
and transgenic mice by central nervous system or peripheral inocu- 
lation with Alzheimer’s disease brain homogenate* ''. The marked 
deposition of parenchymal and vascular Af in these relatively 
young patients with iCJD, in contrast with other prion disease 
patients and population controls, is consistent with iatrogenic 
transmission of AB pathology in addition to CJD and suggests that 
healthy exposed individuals may also be at risk of iatrogenic 
Alzheimer’s disease and cerebral amyloid angiopathy. These find- 
ings should also prompt investigation of whether other known 
iatrogenic routes of prion transmission may also be relevant to 
Af and other proteopathic seeds associated with neurodegenera- 
tive and other human diseases. 

Human transmission of prion disease has occurred as a result of a 
range of medical and surgical procedures worldwide as well as by 
endocannibalism in Papua New Guinea, with incubation periods that 
can exceed five decades'*’*. A well-recognized iatrogenic route of 
transmission was by treatment of persons of short stature with pre- 
parations of human growth hormone, extracted from large pools of 
cadaver-sourced pituitary glands, some of which were inadvertently 
prion-contaminated. Such treatments commenced in 1958 and ceased 
in 1985 following the reports of the occurrence of CJD amongst reci- 
pients. A review of all 1,848 patients who were treated with cadaveric- 
derived human growth hormone (c-hGH) in the United Kingdom 
from 1959 through 1985 found that 38 had developed CJD by the year 
2000 with a peak incubation period of 20 years’. Multiple preparations 
using different extraction methods were used over this period and 
patients received batches from several preparations. One preparation 


(Wilhelmi) was common to all patients who developed iCJD and 
size-exclusion chromatography, used in non-Wilhelmi preparation 
methods, may have reduced prion contamination’. As of 2012, a total 
of 450 cases of iatrogenic CJD have been recognized worldwide after 
treatment with c-hGH or gonadotropin (226 cases), transplantation of 
dura mater (228) or cornea (2), and neurosurgery (4) or electroenceph- 
alography recording using invasive medical devices (2)*. In France, 
119/1,880 (6.3%) recipients developed iCJD, in the UK 65/1,800 
(3.6%) and in the USA 29/7,700 (0.4%)*"*. 

Since 2008, most UK patients with prion disease have been recruited 
into the National Prion Monitoring Cohort study’, including 22 of 
24 recent patients with iatrogenic CJD (iCJD) related to treatment with 
c-hGH over this period, all of whom necessarily have very long incuba- 
tion periods. Of this group of patients with iCJD, eight patients (refer- 
enced no.s 1-8, Supplementary Information) aged 36-51 years, with 
an incubation period from first treatment to onset of 27.9-38.9 years 
(mean 33 years) and from last treatment to onset of 18.8-30.8 years 
(mean 25.5 years), underwent autopsy with extensive brain tissue sam- 
pling at our hospital. In all eight brain samples we confirmed prion 
disease with abnormal prion protein labelling of the neuropil, peri- 
neuronal network and in most cases microplaques as described prev- 
iously'*'”. However, four (no.s 4, 5, 6, 8) of the eight patients with iCJD 
also showed substantial amyloid-f (AB) deposition in the central nerv- 
ous system parenchyma by histology (Fig. 1) and immunoblotting 
(Fig. 2). A further two brain samples (no.s 1, 3) had focal AB pathology 
in one of the brain regions; one showed AB entrapment in PrP plaques 
and only one was entirely negative for AB. Furthermore, there was 
widespread cortical and leptomeningeal cerebral AB angiopathy 
(CAA)? in three patients (no.s 4, 6, 8) and focal CAA in one patient 
(no. 5) (Fig. 1). Such pathology is extremely rare in this age range, 
10/290 in the equivalent 36-50year age strata without CJD™, 
P= 0.0002, Fisher’s test. None of our patients with iCJD had patho- 
genic mutations in the prion protein gene (PRNP). We used a custom 
next generation sequencing panel* to exclude mutation in any of 16 
other genes associated with early-onset Alzheimer’s disease, CAA, or 
other neurodegenerative disorders, and none carried APOE ¢4 or 
TREM2 R47H alleles (Supplementary Table 2). Although such obser- 
vations are unprecedented in our wide experience of human prion 
diseases, we nevertheless considered whether prion disease itself might 
predispose to, or accelerate, AB pathology, for example by cross-seed- 
ing of protein aggregation or overload of clearance mechanisms for 
misfolded proteins. We therefore compared the AB pathology in the 
iCJD cohort with that of a cohort of 116 patients with other prion 
diseases who had undergone autopsy: sporadic CJD (sCJD) (n = 85, 
age 42-83), variant CJD (m = 2, age 25 and 36) and inherited prion 
diseases (IPD) (n = 29, age 29-86). None of the patients in the control 
cohorts had comparable AB pathology (Consortium to Establish a 
Registry for Alzheimer’s disease (CERAD) score, P= 0.001, CAA, 


1Division of Neuropathology, The National Hospital for Neurology and Neurosurgery, Queen Square, London WC1N 3BG, UK. *Medical Research Council Prion Unit, Queen Square, London WC1N 3BG, UK. 
3Department of Neurodegenerative Disease, UCL Institute of Neurology, Queen Square, London WC1N 3BG, UK. “National Prion Clinic, The National Hospital for Neurology and Neurosurgery, Queen 
Square, London WC1N 3BG, UK. 5MRC Clinical Trials Unit at University College London, 125 Kingsway, London WC2B 6NH, UK. 
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P = 0.005, topographical AB score P = 0.02, and cumulative AB score 
P= 0.02 (rank sum test) and digital AB quantification P = 0.04 (t-test); 
all restricted to the strata aged 36-51 years (n = 19)) (Fig. 3, and 
Extended Data Figs 1 and 2 show similar results in adjusted analyses 
in the full cohort). Indeed none of 35 prion cases aged 52-60 had 
significant AB pathology, with the exception of two cases at ages 57 


6E10 


Figure 2 | Immunoblots of Af in iCJD patient brains. a—c, 10% (w/v) brain 
homogenates from patients with iCJD were analysed by enhanced 
chemiluminescence using anti-human AB monoclonal antibodies 6E10 that 
recognizes full-length APP and fragments that contain the epitope including 
Af (a) or 82E1 that specifically recognizes AB (b) or secondary antibody only 
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No colocalization of amyloid B and prion protein in vessel walls 


Figure 1 | AB accumulation in central nervous 
system parenchyma and blood vessels (CAA) in 
iCJD. a, Frontal cortex with widespread diffuse 
AB deposition, formation of plaques, and 
widespread parenchymal and leptomeningeal CAA 
(patient no. 4). b, c, Non-colocalized deposition 
of AB and prion protein. Vessels with CAA do not 
entrap or co-seed prion protein. d, e, Adjacent 
histological sections stained for AB or prion 
protein show clearly separated plaques of both 
proteins (no. 5). f, An overlay with colour inversion 
of prion protein plaques highlights the separation. 
g, h, Dual labelling, confocal laser microscopy 
shows no co-localization of parenchymal AB 
plaques (no.s 5, 6) or CAA (no. 6). i, AB is detected 
in pituitary glands in patients with a high AB 
load in the brain. Scale bar corresponds to 200 bm 
in a, 100 pm in b-h, and 50 um ini. 


Prion protein 


and 58 positive for APOE ¢4 alleles. Instead, the sCJD cohort shows AB 
pathology in parenchyma and blood vessels to a similar extent/severity 
as seen in iCJD, only in a much older age group (Extended Data Figs 1 
and 2), in keeping with the chance coincidence of late-onset AB patho- 
logy and sCJD as previously documented in a large study of 110 sCJD 
patients and 110 age-matched controls aged 27-84 (ref. 19) and a study 


82E1 Secondary antibody only 


(c). The identity of the patient brain sample is designated above each lane 
and the position of molecular mass markers is shown to the left. The 
equivalent of 5 pil 10% (w/v) brain homogenate was loaded per lane. The 
migration position of Af is indicated by the arrow. For gel source data, see 
Supplementary Fig. 1. 
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of 2,661 individuals aged 26-95 (ref. 18). Further, we investigated 
whether prion and AB pathology co-localize in the iCJD cases. In our 
series there was a distinct absence of overlap of AB plaques and PrP 
(Fig. 1d, e) or AB CAA and vascular PrP (Fig. 1b, c), consistent with 
these pathologies developing independently. 

We then went on to examine pituitary glands for the presence of AB 
deposits. Pathological species of tau, AB and o-synuclein have been 
reported in the pituitary gland of patients with neurodegenerative 
disease and controls”. We examined 55 pituitary glands, 6 from 
patients without, and 49 from patients with cerebral AB pathology, 
and found in the latter group seven samples containing AB, confirming 
frequent Af in pituitaries of patients with Alzheimer’s disease-like 
pathology” (Fig. 1i and Extended Data Fig. 3), consistent with the 
hypothesis that AB seeds have been iatrogenically transmitted to these 
patients with iCJD. 

There has been longstanding interest as to whether other neurode- 
generative diseases associated with the accumulation of aggregates of 
misfolded host proteins or amyloids might be transmissible in a ‘prion- 
like’ fashion”’”*, Experimental seeding of AB pathology has previously 
been demonstrated in primates and transgenic mice by central nervous 
system inoculation with Alzheimer’s disease brain homogenate*”®. Of 
particular interest with respect to our findings is that peripheral (intra- 
peritoneal) inoculation with Alzheimer’s disease brain extract into 
APP23 (ref. 11) transgenic mice has been demonstrated. While ageing 
APP23 mice show mostly parenchymal deposits, the intraperitoneally- 
seeded mice showed predominantly CAA, a feature seen in patients 
with iCJD who had significant AB pathology. This experimental study 
and our findings suggest that there are mechanisms to allow the trans- 
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Figure 3 | Early AB accumulation in the parenchyma and blood vessels in a 
subset of eight patients with iCJD aged 36-51 years, but not in controls 
(stratum aged 36-51 years) of 19 prion diseases of other aetiologies, suggests 
human transmission. a, Widespread, moderate-to-severe early-onset CAA in 
three, and focal, mild CAA in one iCJD patient but only one focal, mild 
CAA in 19 controls. b, Significant differences of parenchymal AB accumulation 
(all central nervous system regions, see supplementary material). c, d, Cortical 
AB load was assessed semiquantitatively and quantitatively and again was 
significantly different between the iCJD and age-matched control cohort. For 
methods of quantification and calculations of significance levels see 
Supplementary Information. 
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port of AB seeds as well as prions (and possibly other proteopathic 
seeds such as tau”’) from the periphery to the brain*”*. While less than 
4% of UK c-hGH treated individuals have developed iCJD, one out of 
eight patients with iCJD had focal, and three had widespread, mod- 
erate or severe CAA. Four patients had widespread parenchymal AB 
pathology and two further patients had focal cortical AB deposits. This 
might suggest that healthy individuals exposed to c-hGH are at high 
risk of developing early-onset AB pathology as this cohort ages. 

Although none of the iCJD cases with AB pathology had hyperpho- 
sphorylated tau neurofibrillary tangle pathology characteristic of 
Alzheimer’s disease, it is possible that the full neuropathology of 
Alzheimer’s disease would have developed had these individuals not 
succumbed to prion disease at these relatively young ages. An earlier 
study concluded that c-hGH recipients did not seem to be at increased 
risk of Alzheimer’s disease, but this was based on death certificates 
only without autopsy data*’. However, the severe CAA seen in the 
patients with iCJD in our study is unquestionably concerning and 
individuals with such pathology would be at increasing risk of cerebral 
haemorrhages had they lived longer. At-risk individuals, including 
patients who had received dura mater grafts*® could be screened by 
magnetic resonance imaging (MRI) for CAA-related pathologies (such 
as microbleeds) and by positron emission tomography (PET) for AB 
deposition”. 

It is possible, however, that prions and Af seeds co-purify in the 
extraction methods used to prepare c-hGH, which might mean that 
there would be a relatively higher occurrence of AB pathology in those 
with iatrogenic prion infection. Analysis of any residual archival 
batches of c-hGH for both prions and Af seeds might be informative 
in this regard’. While our data argue against cross seeding, we cannot 
formally exclude the possibility that prions somehow seed AB depos- 
ition but do not co-localize with AB deposits. While there is no sug- 
gestion that Alzheimer’s disease is a contagious disease and no 
supportive evidence from epidemiological studies that Alzheimer’s 
disease is transmissible, notably by blood transfusion”, our findings 
should prompt consideration of whether other known iatrogenic 
routes of prion transmission, including surgical instruments and blood 
products, may also be relevant to Af and other proteopathic seeds seen 
in neurodegenerative diseases. AB seeds are known, like prions, to 
adhere to metal surfaces and to resist formaldehyde inactivation and 
conventional hospital sterilisation”’. 


Online Content Methods, along with any additional Extended Data display items 
and Source Data, are available in the online version of the paper; references unique 
to these sections appear only in the online paper. 
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METHODS 


No statistical methods were used to predetermine sample size, the experiments 
were not randomized and the investigators were not blinded to allocation during 
experiments and outcome assessment. 

Patient recruitment and genotyping. A national referral system for prion diseases 
was established by the Chief Medical Officer in the UK in 2004. UK neurologists were 
asked to refer all patients with suspected prion disease jointly to the National CJD 
Research and Surveillance Unit in Edinburgh and the NHS National Prion Clinic 
(NPC) in London. All patients with possible CJD who had received cadaver-derived 
growth hormone were referred to the NHS National Prion Clinic (London, UK) and 
since 2008 were recruited into the National Prion Monitoring Cohort study. 
Next-generation sequencing to exclude mutations known to be causal of AB 
pathology. Deep next-generation sequencing using a custom panel was performed as 
described previously*'. Analysis was done using NextGENe and Geneticist Assistant 
software (Softgenetics, USA). Variants were assessed for pathogenicity by reference to 
the published literature, control population allele frequencies (our primary database 
for allele frequency was the Broad Institute’s ExAC browser (http://exac.broadinsti- 
tute.org/)) and in silico predictive tools. The analysis methodology has been validated 
for the detection of APP duplication*’, which was important to exclude. No causal 
mutations for dementia or AB pathology were detected, see Supplementary Table 2. 
As expected, several rare variants were detected which may modify the risk of various 
neurodegenerative diseases, see Supplementary Table 2. 

Autopsies and tissue preparation. Autopsies were carried out in a post mortem 
room designated for high risk autopsies. Informed consent to use the tissue for 
research was obtained in all cases. Ethical approval for these studies was obtained 
from the Local Research Ethics Committee of the UCL Institute of Neurology/ 
National Hospital for Neurology and Neurosurgery. The anterior frontal, tem- 
poral, parietal and occipital cortex and the cerebellum (at the level of dentate 
nucleus) were dissected during the post mortem procedure and frozen. Samples 
of the following areas were taken and analysed: frontal, temporal, parietal, occi- 
pital, posterior frontal cortex including motor strip, basal ganglia, thalamus, hip- 
pocampus, brain stem including midbrain, and cerebellar hemisphere and vermis. 
Pituitary glands were taken in all cases. 

Tissue samples were immersed in 10% buffered formalin and prion infectivity 
was inactivated by immersion into 98% formic acid for one hour. Tissue samples 
were processed to paraffin wax and tissue sections were routinely stained with 
haematoxylin and eosin. 

Antibodies and immunohistochemistry. The following antibodies were used: 
Anti-PrP ICSM35 (D-Gen Ltd, London, UK***? 1:1,000), Anti-phospho-Tau (AT-8, 
Innogenetics, 1:100) and anti-BA4 (DAKO 6F3D, 1:50). ICSM35 was stained ona 
Ventana Benchmark or Discovery automated immunohistochemical staining 
machine (ROCHE Burgess Hill, UK); BA4 and Tau were stained on a LEICA 
BondMax (LEICA Microsystems) or a Ventana automated staining instrument 
following the manufacturer’s guidelines, using biotinylated secondary antibodies 
and a horseradish-peroxidase-conjugated streptavidin complex and diaminoben- 
zidine as a chromogen. 

Immunoblot detection of AB in iCJD brain. Biochemical studies were carried 
out in a microbiological containment level 3 facility with strict adherence to safety 
protocols. Frozen brain tissue was available from seven of eight patients with growth 
hormone iCJD (cases 1 and 3-8). 10% (w/v) brain homogenates (grey matter; 
frontal cortex) were prepared in Dulbecco’s PBS lacking Ca** or Mg** ions using 
tissue grinders as described previously**. 20-1] aliquots were treated with 1 ll ben- 
zonase nuclease (purity >99%; 25 U ml” '; Novagen) for 15 min at 20 °C. Samples 
were then mixed with an equal volume of 2 SDS sample buffer (125 mM Tris-HCl, 
20% (v/v) glycerol pH 6.8 containing 4% (w/v) SDS, 4% (v/v) 2-mercaptoethanol 
and 0.02% (w/v) bromophenol blue) and immediately transferred to a 100°C 
heating block for 10 min. Electrophoresis was performed on 16% Tris-glycine gels 
(Invitrogen), run for 70 min at 200 V, before electroblotting to Immobilon P mem- 
brane (Millipore) for 16h at 15V as described previously. Membranes were 
blocked in phosphate buffered saline (PBS) containing 0.05% (v/v) Tween 20 
(PBST) and 5% (w/v) non-fat dried skimmed milk powder. Blots were then probed 
with anti-human AB monoclonal antibodies 6E10 (Covance) and 82E1 (IBL inter- 
national, Hamburg, Germany) at final concentrations of 0.2 1g ml~ lin PBST for at 
least 1h. After washing for 1h with PBST the membranes were probed with a 
1:10,000 dilution of alkaline-phosphatase-conjugated goat anti-mouse IgG second- 
ary antibody (Sigma-Aldrich no. A2179) in PBST. After washing (90 min with 
PBST and 5min with 20mM Tris pH 9.8 containing 1 mM MgCl.) blots were 
incubated for 5 min in chemiluminescent substrate (CDP-Star; Tropix Inc.) and 
visualized on Biomax MR film (Carestream Health Inc.). Anti-human AB mono- 
clonal antibody 82E1 recognizes an epitope specific to the amino terminus of AB 
while 6E10 recognizes an epitope spanning residues 3-8 of AB and cross-reacts 
with full-length APP or APP fragments that contain the epitope. 
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Examination of prion pathology. In all iCJD cases there was variably prominent 
microvacuolar change in the neocortices, deep grey nuclei and cerebellar cortex. 
Immunostaining for the abnormal prion protein revealed synaptic labelling in all 
grey matter areas examined. In all but one case, there were also microplaques in all 
grey matter structures. Variability in the intensity of the immunoreactivity for the 
abnormal prion protein was evident but detailed comparison between the cases 
and separately within each case was not feasible as prolonged formalin fixation in 
some cases significantly attenuated the immunoreactivity. It was apparent that 
more prominent microvacuolar change and synaptic labelling for abnormal prion 
protein was more intense in the pre-central gyrus and parietal lobe when com- 
pared to the anterior frontal and occipital cortices. Deep cortical layers showed 
more severe changes. In all cases the microvacuolar degeneration and prion pro- 
tein deposits in the deep grey nuclei and hippocampal formation was prominent. It 
was most severe in the caudate nucleus and putamen, and appeared less severe in 
thalamus and it was least prominent in the globus pallidus. In the cerebellar vermis 
there was marked granule cell atrophy and often widespread loss of Purkinje cells 
accompanied by severe Bergmann gliosis, while cerebellar hemispherical cortex 
showed only patchy loss of Purkinje cells and no significant granule cell loss. 
Microvacuolar degeneration in the molecular layer was more prominent in the 
vermis than in the cerebellar hemisphere. No apparent difference in prion protein 
deposition was seen in vermis and hemisphere. In the dentate nucleus variably 
intense synaptic prion protein immunoreactivity was present, while the cyto- 
architecture of the nucleus was well preserved. 

Examination, classification and quantification of AB pathology. All brains 
were examined according to the ABC classification*’, which assesses the topo- 
graphic progression of AB pathology in the brain (Thal phases**), topographic 
progression of Tau neurofibrillary tangle pathology (Braak and Braak*’) and the 
density of mature (senile), neuritic plaques in the neocortex (Consortium to 
Establish a Registry for Alzheimer’s Disease (CERAD) criteria**”’). To allow a 
more detailed assessment of neocortical Af the original Thal phases were modified 
as follows. Phase 0, no cortical AB; phase 0.5, 1-2 neocortical regions affected; 
phase 1, 3-4 neocortical regions involved; phases 2-5 were scored as published”*. 
In addition we have carried out a semiquantitative assessment of neocortical AB 
load ona standardised region within frontal, temporal, parietal and occipital lobes, 
and scored as follows. 0, entirely negative; 1, a single small deposit; 2, multiple 
small deposits, disseminated; 3, multiple small deposits, plus an area with a larger 
patch; 4, diffuse moderate numbers of deposits; 5, diffuse, frequent numbers of 
deposits. For each case a cumulative score (0-20) of total semiquantitatively 
assessed AB load in the neocortex was calculated. Cerebral amyloid angiopathy 
(CAA) was graded (0-3) according to the Vonsattel criteria’. CAA was assessed in 
leptomeninges and parenchyma of all hemispheric lobes and cerebellum with 
summary score (0-30) calculated for each case. 

Image acquisition and processing. Histological slides were digitised on a LEICA 
SCN400F scanner (LEICA Milton Keynes, UK) at X40 magnification and 65% 
image compression setting during export. Slides were archived and managed on 
LEICA Slidepath (LEICA Milton Keynes, UK). For the preparation of light micro- 
scopy images, 1,024 X 1,024 pixel sized image captures were taken, after matching 
paired images (Af and prion staining) in Slidepath, and overlays in Fig. 1f were 
prepared using the colour conversion function in conjunction with the image 
overlay in Slidepath. Laser scanning microscopy of double immunofluorescent 
tissue preparations was on a ZEISS LSM710 confocal microscope (ZEISS 
Cambridge, UK). Publication figures were assembled in Adobe Photoshop. Data 
plots were generated using Prism 5 (GraphPad Software, Inc., La Jolla, USA). 
Digital image analysis for cortical AB quantification. From all cases AB immu- 
nostained slides from frontal, temporal, parietal and occipital lobes were digitised 
as described above. Digital image analysis on 496 whole slides was performed 
using Definiens Developer 2.3 (Definiens, Munich, Germany). Initial tissue iden- 
tification was performed at a resolution corresponding to 5X image magnification 
and stain detection was performed at X 10 resolution. Tissue detection and initial 
segmentation was done to identify all tissue within the image, separating the 
sample from background and non-tissue regions for further analysis. This sepa- 
ration was based on identification of the highly homologous relatively bright/white 
region of background present at the perimeter of each image. A composite raster 
image produced by selecting the lowest pixel value from the three comprising 
colour layers (RGB colour model) provided a greyscale representation of bright- 
ness. The mean brightness of this background region was used to exclude all 
background regions from further analysis. 

Stain detection (brown) is based on the transformation of the RGB colour model 
to a HSD representation”. This provides a raster image of the intensity of each 
colour of interest (brown and blue). A series of dynamic thresholds (T,) are then 
used to identify areas of interest (A,). Initially, following exclusion of intensely 
stained areas with values greater than 1 arbitrary unit (au) (values range from Oau 
to 3au in HSD images), the 5th centile (C5) of brown stain intensity was calculated 
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as a baseline. This represents the Ty,own stain Separating the top 5% of Atissue- The 
standard deviation (C56) within the lower 95% of Atissue Was used to update 
the Thrown stain a8 C5+ (6 x C56) with all pixels above this threshold classed as 
‘stain’ (Agtain) and those below as ‘unstained’ (Aynstainea)- Astain Was excluded if 
the intensity of blue staining was not significantly lower than the level of 
brown stain (difference less than 0.lau) to remove generically dark areas. 
The remaining Again were further categorised using thresholds based on the 
mean (B) and standard deviation (Bd) of brown staining within the Aunstained? 
Torown = B+ (3 x BO) (lower threshold); Taark brown = B+ (6 x BO) (upper thresh- 
old), to give Aunstained = Tyrown > Alight brown = Taark brown > AAp deposit: Artefacts 
were then identified as Again With area greater than 1 mm’, or an area greater than 
0.1 mm? with a standard deviation of brown staining below 0.2au. These Aartefacts 
were then expanded to include surrounding pixels with brown staining greater 
than C5. This excludes large areas of homogenous staining and areas of more 
diffuse, non-specific chromogen deposit. 

The white matter region within the tissue was then manually selected by an 
expert neuropathologist (Z.J., S.B.). This white matter was excluded from calcula- 
tion of proportional coverage of Aag deposit Within Aissue- 
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Extended Data Figure 1 | Vertical scatter plot of AB pathology in iCJD brain 
samples, age-matched and older controls. a, CAA summary score as 
described in Methods. iCJD (age range 36-51) with three highly scoring brain 
samples with CAA. In the age-matched control groups, no comparable 
pathology was found. Significant CAA was only seen ina cohort including older 
individuals carrying the APOE ¢4 allele. The outlier in the sCJD group (pink 
triangle in the sCJD > 52 year group) had a surgical intervention 40 years 
before death, and in addition to CJD also had severe CAA. b, Topographical AB 
deposition, assessed according to a scheme modified from the Thal classification 
as described in the Methods section. In the group of individuals of 51 years 
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and below, significant AB deposition is seen in the iCJD group, but not in age- 
matched prion diseases of different aetiology. c, Cortical AB deposition, assessed 
according to CERAD. In the group of individuals of 51 years and below, mature 
(neuritic) plaques are seen in the iCJD group, but not in age-matched prion 
diseases of different aetiology. Only in the cohort comprising much older 
individuals is there an increase of cortical mature plaques. d, e, Semiquantitative 
assessment and quantification of neocortical AB using Definiens Developer image 
analysis shows a separation that is similar to that shown in a and b. APOE ¢4 
genotype was unavailable in nine of the 85 patients with sCJD and these were not 
included in the graphs. Note the logarithmic scale in e. 


©2015 Macmillan Publishers Limited. All rights reserved 


LETTER 


Cerebral amyloid angiopathy 


pe) 


25 
g © iCJD 
3 20 ° + sCuD 
3 ee 
£ v y Y sCJD APOE&4 
oO 
§ 15 
5 © IPD 
2 10 = vCJD 
5 
a v 
e 5 v v 
° ¥ v M 

Va 


0 wee oe WF 
rr 
20 30 40 50 60 70 80 90 


Age 
Cc Cortical AB (CERAD) 
3 ~- = 
e iCJD 
sCJD 
$2 oe we ie ee SCID APOE 84 
o 
ee * IPD 
a 
® = vCJD 
6 1 ae 7 
0 9 -90-GR-40- GLACE FARE A ke 
a Senn nace Senn nn nen Soares Stn Sns NR DUOENEONN 


20 30 40 50 60 70 80 90 
Age 


@ 


Quantitative cortical A B 


1071 

6 7 @ iCJD 
e v 

s e060 Vaya SY sCJD 
& 10- yev* 4 
F re Y sCJD APOE«4 
= v 
apres t, Vy © IPD 
© 10 © se 
cs] # 6 ¢ ee m= vCJD 
ey we 
© 49-4 * . ¢o . & 
A eo, 4 i { . 
Oo 

10-5 us 


Age 


Extended Data Figure 2 | Scattergram of Af pathology in iCJD brain 
samples compared with other prion diseases. Plot of severity scores of CAA 
or parenchymal Af against the age of individuals in the cohorts, demonstrating 
early-onset of CAA and grey matter AB pathology in the iCJD cohort. a, 
Early-onset of CAA. The outlier in the sCJD group (pink triangle) had a 
surgical intervention 40 years before death, and in addition to CJD also had 
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severe CAA. b, c, Early detection of grey matter AB by a topographical 
assessment (Thal phase) and using CERAD Criteria. d, e, Semiquantitative 
assessment and quantification of neocortical AB. APOE ¢4 genotype was 
unavailable in nine of the 85 patients with sCJD and these were not included in 
the graph. Note the logarithmic scale in e. 
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Extended Data Figure 3 | Topographical AB of the brain samples 
corresponding to the pituitaries analysed for AB. Plot of the Thal phases 
(topographical AB deposition) of the brain samples corresponding to the 55 
pituitary glands examined for the presence of AB. AB was assessed in 55 
pituitary glands from patients with iCJD (n = 5, age range 36-47), IPD (n = 4, 
age range 51-95), sCJD (n = 41, age range 54-89) and non-CJD controls 

(n = 5, age range 72-90) (groups shown on x axis). In six patients from iCJD, 
IPD and sCJD groups no Af deposits were found in the brain or pituitary gland. 
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In 49 patients from all groups (iCJD, IPD, sCJD and non-CJD) there were 
variably frequent AB deposits in the brain parenchyma, corresponding to Thal 
phases 1-5 (distribution shown on y axis). Of these 49 cases, six cases (IPD 
n= 1, sCJD n= 4, and non-CJD n = 1) showed Af deposits also in the 
pituitary glands (positive cases highlighted in red) and in one patient from sCJD 
group AB deposits were seen in the brain tissue attached to the pituitary gland 
(highlighted in blue). 
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Single-cell messenger RNA sequencing reveals rare 


intestinal cell types 
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Hans Clevers’? & Alexander van Oudenaarden!* 


Understanding the development and function of an organ requires 
the characterization of all of its cell types. Traditional methods for 
visualizing and isolating subpopulations of cells are based on mes- 
senger RNA or protein expression of only a few known marker 
genes. The unequivocal identification of a specific marker gene, 
however, poses a major challenge, particularly if this cell type is 
rare. Identifying rare cell types, such as stem cells, short-lived pro- 
genitors, cancer stem cells, or circulating tumour cells, is crucial to 
acquire a better understanding of normal or diseased tissue bio- 
logy. To address this challenge we first sequenced the transcrip- 
tome of hundreds of randomly selected cells from mouse intestinal 
organoids’, cultured self-organizing epithelial structures that con- 
tain all cell lineages of the mammalian intestine. Organoid buds, 
like intestinal crypts, harbour stem cells that continuously differ- 
entiate into a variety of cell types, occurring at widely different 
abundances’. Since available computational methods can only 
resolve more abundant cell types, we developed RaceID, an algo- 
rithm for rare cell type identification in complex populations of 
single cells. We demonstrate that this algorithm can resolve cell 
types represented by only a single cell in a population of randomly 
sampled organoid cells. We use this algorithm to identify Reg4 as a 
novel marker for enteroendocrine cells, a rare population of hor- 
mone-producing intestinal cells*. Next, we use Reg4 expression to 
enrich for these rare cells and investigate the heterogeneity within 
this population. RaceID confirmed the existence of known entero- 
endocrine lineages, and moreover discovered novel subtypes, 
which we subsequently validated in vivo. Having validated 
RaceID we then applied the algorithm to ex vivo-isolated Lgr5- 
positive stem cells and their direct progeny. We find that Lgr5- 
positive cells represent a homogenous abundant population of 
stem cells mixed with a rare population of Lgr5-positive secretory 
cells. We envision broad applicability of our method for discover- 
ing rare cell types and the corresponding marker genes in healthy 
and diseased organs. 

Single-cell mRNA sequencing has emerged as a powerful method to 
simultaneously measure cell-to-cell expression variability of thousands 
of genes’. Recently, it was demonstrated that sequencing of randomly 
selected cells from spleen’ and lung tissue® permits the identification of 
known cell types within these organs. The approaches used in these and 
other recently published studies’”""° show good performance in disco- 
vering abundant cell types but cannot detect rare cell types. 

To profile cell types of widely varying abundance within a complex 
mixture we introduce a method for rare cell type identification 
(RaceID) and apply it to investigate rare cell types in the mouse 
small intestine". 

The continuously self-renewing intestinal epithelium is arranged in 
crypts and villi. A small number of intestinal stem cells reside near the 
crypt bottom and give rise to rapidly proliferating transit amplifying 
(TA) cells. While migrating upward along the crypt-villus axis TA cells 


develop into the terminally differentiated cell types*’’. Absorptive 
enterocytes constitute the most abundant cell type, while all other 
mature cell types contribute only a few percent or less. The secretory 
lineage comprises mucus producing goblet, hormone secreting entero- 
endocrine, and Paneth cells, which provide a niche for the stem cell 
and secrete bactericidal products. In addition, tuft cells are believed to 
sense the luminal content. 

To obtain clean random mixtures of intestinal cells without con- 
tamination of non-epithelial cell types, we use intestinal organoids, 
small epithelial structures containing all major cell types found in 
the intestinal epithelium’. Using a modified version of the cell express- 
ion by linear amplification and sequencing (CEL-seq) method” 
incorporating unique molecular identifiers to count transcripts’* 
(Fig. la and Extended Data Fig. 1), we sequenced 238 randomly 
selected organoid cells with more than 3,000 transcripts in total each, 
and quantified 3,777 genes with more than five transcripts in at least 
one cell. 

Hierarchical clustering of the transcriptome correlation matrix sug- 
gested the presence of three major groups of cells (Extended Data 
Fig. 2a). To screen for abundant cell types more systematically we 
employed k-means clustering of the correlation matrix with six clus- 
ters as inferred by the gap statistic’? (see Methods, Fig. 1b and 
Extended Data Fig. 2b). We visualized these clusters in two dimensions 
(Fig. 1c) using t-distributed stochastic neighbour embedding (t-SNE)"® 
and examined if expression of known intestinal marker genes was 
restricted to specific clusters (Extended Data Fig. 3). The intestinal 
alkaline phosphatase (Alpi) is a known enterocyte marker and showed 
a gradual expression increase across clusters 1, 4 and 5 (Extended Data 
Fig. 3a). Cluster 3 comprises distinct cells with non-overlapping 
expression of marker genes for diverse secretory cell types, such as 
the enteroendocrine marker Chga, the goblet cell marker Muc2, or 
the Paneth cell marker Lyz1 (Extended Data Fig. 3b-d). The central 
cluster 2 does not express specific marker genes, but shows pro- 
nounced expression of genes encoding ribosomal proteins (Extended 
Data Fig. 3e), indicating the presence of transit amplifying cells. The 
bottom part of this cluster contains cells expressing low levels of the 
stem cell marker Lgr5 (Extended Data Fig. 3f). 

To detect rare cell types, we screened for outliers that could not be 
explained by a background model accounting for technical and bio- 
logical gene expression noise (see Methods, Fig. 2a and Extended Data 
Fig. 4). Distinct outliers were grouped into 10 novel clusters based on 
transcriptome correlation (see Methods, Fig. 2b). Differential gene 
expression analysis revealed the presence of rare cell types among these 
clusters, comprising goblet, tuft, Paneth and enteroendocrine cells 
(Fig. 2c and Supplementary Table 1). Moreover, RaceID detected three 
secretory precursor clusters, co-expressing Neurog3 with Krt7, Pax4 
or Ang4 (Fig. 2c). Available methods for cell type identification’ 
were clearly out-performed by RaceID (Extended Data Fig. 5a and 
Supplementary Note). Extensive experimental validation proved the 
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Figure 1 | Profiling cell composition of mouse intestinal organoids with 
single-cell sequencing. a, Intestinal crypts were isolated from mice and grown 
into intestinal organoids as described previously’. Organoids were dissociated 
and single cells, collected by fluorescence-activated cell sorting (FACS), were 
sequenced by a modified version of the CEL-seq method’** (see Methods). 
b, Heat map indicating similarities between 238 single cells measured by 
Euclidean distances of the transcriptome correlation matrix (unitless; see 
Methods). k-means clustering identified six major groups of cells colour coded 
along the axes. c, t-SNE map representation of transcriptome similarities 
between individual cells. Clusters identified in b were highlighted with different 
colours and corresponding intestinal cell types identified on the basis of known 
marker genes are indicated. 


specificity and sensitivity of RaceID (Extended Data Fig. 4c-e and 
Supplementary Note). We further showed that cell cycle related genes 
are unlikely to affect the results of RaceID for our data set (Extended 
Data Fig. 5f). 

Enteroendocrine cells control metabolism by secreting at least 
ten different hormones*”’ and individual cells produce subsets of 
those’*””. To profile heterogeneity of enteroendocrine cells, we aimed 
at purifying a random population of mature enteroendocrine cells. We 
identified enteroendocrine markers by a z-score analysis (Fig. 3a). 
Among the top scoring genes were novel markers such as the protei- 
nase Pappa2 and the largely uncharacterized gene Reg4. We focused on 
the latter, since it was highly expressed in enteroendocrine cells with 
hundreds of sequenced mRNAs. We validated Reg4 as an in vivo 
marker for enteroendocrine cells by single-molecule fluorescent in situ 
hybridization” (smFISH) in the mouse intestine. Co-staining of Chga 
and Reg4 revealed high levels of Reg4 in enteroendocrine cells and 
substantially lower levels in Paneth cells at the crypt bottom 
(Fig. 3b). We then purified Reg4-positive organoid cells derived from 
a Reg4-red fluorescent protein (dsRed) reporter mouse (see Methods 
and Extended Data Fig. 6). RaceID predicted three major groups 
among the 161 cells surviving our filtering criteria (Fig. 3c, d). 
Upregulation of defensins suggested that one of these groups com- 
prises maturation stages of the Paneth cell lineage (Extended Data 
Fig. 6e). Within the second group of cells we identified a contamina- 
tion with TA cells (Extended Data Fig. 6f, g). The remaining 60 cells 
(37%) arise from the enteroendocrine lineage proving a pronounced 
enrichment (~eightfold) of this rare cell. We observed two major 
subgroups with low and high levels of Chga, respectively (Fig. 3d 
and Extended Data Fig. 6h). Hormones expressed in cells with low 
levels of Chga comprise Cck, Ghrl, Sct, Nts and Gcg (Extended Data 
Fig. 7), identifying this group as an intestinal specific branch of entero- 
endocrine cells*'*"”. 
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Figure 2 | RaceID algorithm identifies rare cell types among hundreds of 
sequenced cells. a, Histogram showing the negative logarithm of the 
probability that transcript levels in a particular cell are not explained by a 
background model accounting for the expected variability. Clusters are 
highlighted as in Fig. 1. The probability threshold for outlier identification 
(10 *) was included (black broken line). The inset explains the derivation of 
outlier probabilities (see Methods) for the gene Lars2 in cluster 5. The transcript 
count histogram (grey) is compared to the background model (red) and an 
outlier event is highlighted (purple). b, t-SNE map with additional clusters 
highlighted that were inferred from outlier cells (see Methods). ¢, t-SNE map of 
all clusters obtained by the RaceID algorithm (left) and close-up of all clusters in 
the secretory lineage (cluster 3 in Fig. 1c). Genes corresponding to known 
markers specifically upregulated in each cluster of the secretory lineage are 
indicated. EC, enterocytes; EClpr, late enterocyte precursors; ECepr, early 
enterocyte precursors; TA, transit amplifying cells; G, goblet cells; EE, 
enteroendocrine cells; EEpr, enteroendocrine precursors; P, Paneth cells; 

T, tuft cells. 


The other sub-group co-expressed Tph1 and Tacl, indicating hor- 
mone production of serotonin and substance P, respectively, and 
therefore comprises enterochromaffin cells (Fig. 3d and Extended 
Data Fig. 8a, b). 

Within this sub-group RaceID identified three novel subtypes 
(Fig. 3d, e). In 7 out of 16 (41%) cells within cluster 3 we detected 
co-expression of Tacl and Cck (Extended Data Fig. 7a), previously 
considered to be markers of separate subtypes’. Expression of urocor- 
tin 3 (Ucn3), a ligand of the corticotropin-releasing hormone (Crh) 
receptor type 2 was significantly elevated in cluster 7 (P< 4.1 X 10 7, 
see Methods and Extended Data Fig. 8c). Although colonic expression 
of Ucn3 has been described’, it was not known to be expressed in the 
small intestine. Finally, we observed strong upregulation of Albumin 
(Alb, P< 4.3 X 10 °, see Methods) and the related alphafeto-protein 
(Afp, P~0, see Methods) in cluster 2 (Extended Data Fig. 8d). 
Albumin can bind lipophilic hormones and could regulate their 
accessibility”. In the same cluster, we measured upregulation of 
VGF nerve growth factor (Extended Data Fig. 8e). See Extended 
Data Fig. 8f and Supplementary Table 2 for additional marker genes. 

We validated the existence of the novel enterochromaffin subtypes 
in vivo at the mRNA and protein level by conducting smFISH and 
immunofluorescence experiments in the mouse intestinal epithelium 
(Fig. 4, Extended Data Fig. 9, Methods and Supplementary Table 4). 

After having shown that RaceID can discriminate cell types we 
wanted to test our method on stem cells marked by Lgr5 expression. 
Heterogeneity of the intestinal stem cell pool is still controversial’? *’ 
and single-cell sequencing could help to better characterize this 


©2015 Macmillan Publishers Limited. All rights reserved 


Dimension 2 


a 
f=} 


i 
a 
a 


LETTER 


© 15% Krti9 


14 44 Sox4 
ae 


1 


D 4 


O44 
a 
shee q 


Figure 3 | Reg4 is a novel marker of differentiated enteroendocrine cells. 
a, Histogram of top ten z-scores for upregulation in differentiated 
enteroendocrine cells. For each gene, we extracted the average level observed 
in mature enteroendocrine cells (cluster 3 and 16), subtracted the average level 
observed across all remaining cells, and divided by the standard deviation of 
transcript levels in these cells. b, Validation of Reg4 as an in vivo marker 

of mature enteroendocrine cells by single-molecule fluorescent in situ hybridi- 
zation (smFISH). Cryosections of mouse small intestine were hybridized with 
smFISH probes against Chga, conjugated to tetramethylrhodamine (TMR, 
green) and Reg4, conjugated to cyanine 5 (Cy5, red). Cell borders were 
visualized with AlexaFluor 488-conjugated phalloidin (blue). The entero- 
endocrine cell (arrow) expresses a high level of Chga and co-expresses Reg4. 


population. We first used intestinal organoids derived from an Lgr5- 
green fluorescent protein (GFP) reporter mouse”? to purify and 
sequence 96 Lgr5-GFP* cells. RaceID detected only a single large 
cluster and few outliers that were mostly Paneth cells (Extended Data 
Fig. 10), suggesting that intestinal stem cells represent a uniform 
population. Since a distinct reserve pool of quiescent Lgr5-positive 
cells has been suggested”**’, we next tried to characterize a popu- 
lation of ex vivo isolated Lgr5 expressing cells. For this experiment, 


a Cck*/Tac1* Cck*/Tac1~ Cck~/Tac1* b 


Figure 4 | Single-molecule FISH and immunofluorescence experiments 
confirm expression of markers for enteroendocrine cell sub-populations in 
the mouse small intestine. a, Small intestine cryosections were hybridized 
with smFISH probe libraries. Scale bar, 10 um. Cck and Tacl are expressed by a 
subset of enteroendocrine cells. Probes against Cck, conjugated to Cy5 (upper 
panel, red) and against Tac1, conjugated to TMR (lower panel, green) were 
used for hybridization. Cell borders were visualized by staining with phalloidin, 
conjugated with AlexaFluor 488 (blue). Enteroendocrine cells co-expressing 
the two markers were observed (left), as well as cells expressing only Cck 
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Paneth cells at the crypt bottom (arrowheads and inset) express strongly 
reduced levels of Reg4. Scale bar, 20 um. c, Heat map representing the 
transcriptome similarities measured by the Euclidean distance of the 
transcriptome correlation matrix (see Methods) for Reg4-positive cells. RaceID 
clusters are colour-coded along the axes. Cluster numbers are shown for the 
bigger clusters. d, t-SNE map showing all clusters identified by RaceID for Reg4- 
positive cells. Different colours and numbers highlight distinct clusters. 
Selected upregulated genes are shown for individual clusters (P< 10°, see 
Methods). Close-ups are shown for clusters of enteroendocrine cells. e, Heat 
map of hormone expression (log, scale) in subtypes of enteroendocrine cells 
identified by RaceID. The two groups of cells with low and high expression of 
Chga, respectively, display distinct patterns. 


we isolated 192 Lgr5-enhanced GFP* (EGFP*) cells from the small 
intestine of an Lgr5-EGFP reporter mouse“ (Supplementary Table 
5). RaceID classified these cells into a single large homogenous clus- 
ter and a few outliers (Fig. 5a, b). 

As a complementary approach, we traced the progeny of Lgr5-pos- 
itive cells in vivo, using a reporter mouse that expresses CreERT2 from 
an Lgr5 promoter and YFP from a Rosa26 promoter with a loxP- 
flanked transcriptional roadblock. Administration of tamoxifen leads 


CCK/TAC1* CCK*/TAC1— 


(middle) or Tacl1 (right). Arrows point at cell borders. b, Immunostaining was 
performed on cryosections of mouse small intestinal tissue. Scale bar, 20 jum. 
Expression of CCK and TAC] was observed in a subset of enteroendocrine 
cells. CCK and TAC] were visualized by indirect immunostaining with 
antibodies against CCK (upper panel, red) and against TAC1 (middle panel, 
green). Nuclei were counterstained with DAPI (blue). Rare cells, co-expressing 
the two markers, were observed (left), as well as cells, expressing only 

TACI (middle), or only CCK (right). Arrowheads point at CCK or TACI1- 
negative cells. 


10 SEPTEMBER 2015 | VOL 525 | NATURE | 253 


©2015 Macmillan Publishers Limited. All rights reserved 


LETTER 


Dimension 2 


c 
Fo r Ppr/Gpr 
\ ECpr/EC je 
haar Ns, ( TAICBC 
5 | BM ole 9 Be 
2 Lo fe e ore cath 
oO 
i Hf ag yt gr? 4 
oe Re Wa os Ht 
ig ead bs 
194 
=| ieee 
Class (3) ~~~. 2 ba 


Dimension 1 


Qa 


0.6 € 
re) a> 7 
o50.4 8 5 
= oF 6 
-$ 5 3 
33°? hy a 
= co 7 
we oD 
0 = S 3 
T 
Lgr5- Ppr P EE EC G 
EGFP* 
class (1) 


oe > pp XN & 9) 


4 
a Lo Ly” ¢ 


© 


Figure 5 | Characterization of ex vivo isolated Lgr5-EGFP* cells. a, Heat 
map of transcriptome similarities measured by the Euclidean distance of the 
transcriptome correlation matrix (see Methods) for Lgr5-EGEFP* cells purified 
using an Lgr5-EGFP reporter mouse™. RaceID clusters are colour coded along 
the axes. Colours correspond to panel b. b, t-SNE map of RaceID clusters for 
Lgr5-EGFP* cells. Cells of cluster 7 express non-coding RNAs (Malat1, 
Kenqlot1) and could not be characterized. The other cell types were assigned 
based on marker gene expression. The stem cell classes (1) (dotted line) and (2) 
(dashed line) are outlined (see text for details). c, t-SNE map of RaceID clusters 
for 5-day lineage tracing progeny of Lgr5-positive (YFP *) cells. Cell types 
identified based on marker genes are indicated. The stem cell class (3) is 
outlined (dashed line, see text for details). d, Fraction of Lgr5-positive cells for 


to yellow fluorescence protein (YFP) protein production in Lgr5-pos- 
itive cells and their progeny. We sequenced 432 YFP-positive cells 
collected five days after label induction (Supplementary Table 5). As 
expected, RaceID detected differentiated cells of all major lineages, 
with a relatively large proportion of Paneth cells (Fig. 5c). A possible 
explanation for the over-representation of Paneth cells is label induc- 
tion in mature Paneth cells or their precursors”. We then quantified 
the fraction of Lgr5-positive cells in all major lineages. To extract cells 
of a lineage independent of the maturation state, we only required >5 
transcripts of a lineage marker (Lyz1 for Paneth cells, Chgb for entero- 
endocrine cells, Alpi for enterocytes, and Clca3 for goblet cells). Paneth 
cells were split into early and late stages, with Lyz1 expression lower or 
higher than the median, respectively. While we detected Lgr5 tran- 
scripts in ~45% of the Lgr5-EGFP* cells, this fraction was lower than 
15% for most of the other major cell types (Fig. 5d). Only for Paneth 
cells we observed an elevated proportion of Lgr5-positive cells, which 
was significantly higher in early (~50%) compared to late Paneth cells 
(~28%) (Fig. 5d). This could be due to the Lgr5 RNA half-life exceed- 
ing the rapid transition time of stem cells into Paneth cells, leading to a 
transient state where stem and Paneth cell genes are co-expressed. 
To examine if the population of Lgr5 expressing cells show any kind 
of fate bias towards a particular lineage, we first distinguished three 
classes of stem cells: (1) all Lgr5-EGFP* cells (Fig. 5b), (2) the subset of 
Lgr5-EGFP* cells after removal of the outliers identified by RaceID 
(Cluster 1, Fig. 5b), and (3) the stem cell/early TA cluster from the 
lineage tracing data (Cluster 4, Fig. 5c). Based on the RaceID prediction 
we consider class (2) as a homogenous pool of stem cells, while class 
(1) contains a few additional Lgr5 expressing cells of other lineages. 
Class (3) represents the homogenous stem cell population identified by 
RaceID in the lineage tracing data and is thus expected to resemble 
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the major intestinal lineages. See main text for details. Error bars and P value 
were inferred from binomial statistics reflecting uncertainty due to sampling. 
The number of cells (1) is indicated below the plot. *P < 0.05. e, Mean 
expression of marker genes in different sets of Lgr5-positive cells and in cells of 
the major intestinal lineages. Class (1) corresponds to all Lgr5-EGFP* cells 
shown in b, while class (2) and (3) correspond to the sub-populations indicated 
by the dashed line in b and c, respectively. Error bars indicate the standard 
deviation across cells. The number of cells (7) is given in the legend. EC, 
enterocytes; ECpr, enterocyte precursors; TA, transit amplifying cells; CBC, 
crypt base columnar cells; G, goblet cells; Gpr, goblet cell precursors; EE, 
enteroendocrine cells; P, Paneth cells; Ppr, Paneth cell precursors; Ppr/Gpr, 
early precursors of the Paneth and goblet cell lineage. 


class (2). We then performed a marker gene analysis in all three classes 
and, for comparison, in cell populations of all major lineages (Fig. 5e). 
The three classes of stem cells showed similar expression of Lgr5 and 
other stem cell markers (Hopx and Lrig1). Expression of secretory 
lineage markers (Lyz1 and Chgb) was substantially lower in class (2) 
and (3) compared to class (1), and did not exceed the background level 
observed in any other lineage (Fig. 5e). This argues against additional 
secretory cells in class (2) and (3) that could have remained undetected 
by RaceID. Elevated expression of Lyz1 and Chgb in class (1) is thus 
solely due to the few Lgr5-EGFP” secretory cells identified by RaceID 
(Fig. 5b). Interestingly, Lgr5 transcript levels in early Paneth cells were 
similar to those in stem cells and reduced in the other cell types (Fig. 5e). 

Taken together, we conclude that, both in organoids and in vivo, 
Lgr5-positive cells represent a homogenous population of cells mixed 
with a rare population of Paneth and enteroendocrine cells. In com- 
parison to the other lineages, Paneth cells express the highest level of 
Lgr5, consistent with the observation that Paneth cell precursors can 
revert to the stem-cell state upon tissue damage”*”. It remains a pos- 
sibility that high Lgr5 expression in Paneth cells is an artefact due to 
sequencing doublets of Paneth and crypt bottom stem cells. However, 
we consider this unlikely, because expression of Lgr5 is significantly 
elevated in early versus late Paneth cells (Fig. 5d). Finally, we would 
like to caution that heterogeneity among lowly expressed genes 
could still exist within the stem cell pool, which would be invisible 
owing to the limited sensitivity of current single cell sequencing pro- 
tocols. Alternatively, stem cell heterogeneity could extend to Lgr5-low 
cells as described previously”®, which are not captured by our enrich- 
ment strategy. 

In summary, we demonstrated here the ability of RaceID to correctly 
classify different cell types in a complex mixture and reveal heterogeneity 
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among rare cells. We believe that single-cell mRNA sequencing in 
combination with the RaceID algorithm is a powerful tool to unravel 
heterogeneity of rare cell types in both healthy and diseased organs. 


Online Content Methods, along with any additional Extended Data display items 
and Source Data, are available in the online version of the paper; references unique 
to these sections appear only in the online paper. 
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METHODS 


No statistical methods were used to predetermine sample size, and the experi- 
ments were not randomized 

Generation of the Reg4-dsRed mouse. Reg4—dsRed knock-in mice were gener- 
ated by homologous recombination in embryonic stem cells by targeting a diph- 
theria toxin receptor-2A peptide-dsRed express2 cassette to the ATG start codon 
of Reg4. Generation of the knock-in mouse and experiments were performed 
according to guidelines and reviewed by the Dier Experimenten Commissie 
(DEC) of the KNAW. 

Lgr5-GFP organoids. Organoids from Lgr5-GFP-DTR reporter mice”? were 
cultured under standard contitions'. Nine days after splitting, the organoids were 
dissociated into single cells using TrypLE (Invitrogen) for 15 min at 37°C and 
mechanical disruption using a glass Pasteur pipette. Cells were washed twice in 
advanced DMEM/F12 (GIBCO) and resuspended in advanced DMEM/ 
F12 + 41g ul? DNase I (Roche) and Propidium Iodide (Sigma). Among PI 
negative fraction, high level of GFP (top 10%) expressing cells were sorted directly 
in 96-wells containing 100 pl TRIzol. 

Lineage tracing experiments. For lineage tracing experiments we injected 0.4 mg 
tamoxifen into 3-month-old Lgr5-CreERT2 C57BI6/J mice crossed to a Rosa26LSL- 
YFP reporter mice. 

Isolation of crypts from mouse small intestine. Crypts were isolated from Reg4- 
dsRed mouse as described previously’. Briefly, the whole of the small intestine was 
dissected, flushed with cold Ca?* - and Mg**-free PBS and cut to 4-5 cm pieces for 
convenience. Intestines were cut open longitudinally and villi were scraped off 
with a glass slide. Intestine fragments were washed twice with cold Ca**- and 
Mg**-free PBS, then incubated with 5mM EDTA in PBS at 4 °C for 30 min, with 
gentle agitation. Crypts were released by vigorous shaking of the tissue fragments, 
pelleted by centrifugation (200g at 4 °C for 5 min), washed once with cold PBS and 
once with Advanced DMEM/F12 medium (Life Technologies), pelleted by cent- 
rifugation and used to generate organoids. 

Isolation of Lgr5-EGFP* cells from the mouse intestine. Freshly isolated small 
intestines of Lgr5-EGFP-IRES-creERT2 mice” were incised along their length and 
villi were removed by gentle scraping. The tissue was then washed in ice-cold PBSO 
and subsequently incubated in PBSO0/EDTA (5 mM) for 5 min followed by gentle 
shaking to remove remaining villi. The intestine was then incubated in PBS/EDTA 
for 30 min at 4°C. Vigorous shaking yielded free crypts that were incubated in 
MEM (Gibco) supplemented with trypsin (2 mg ml}; Sigma) and DNase I 
(2,000 U ml}; Sigma) for 30 min at 37°C. Subsequently, cells were spun down, 
resuspended in MEM/DNase and DAPI (Life Technologies) and filtered through a 
40-mm mesh. DAPI-negative, GFP-expressing cells were directly sorted in TRIzol 
(Life Technologies) using a BD FACSAria II cell sorter (BD Bioscience). 
Intestinal organoid culture. Villin-Cre organoids were a gift from H. Farin. 
Organoid culture was carried out as described’. Briefly, organoids were grown 
in a drop of Matrigel (BD Biosciences), overlaid with the ENR medium (see 
below). Organoids were passaged weekly, with 1:4 dilution. Briefly, the old med- 
ium was aspirated, old Matrigel drop broken, organoids were washed and pelleted 
by centrifugation at 200-300g, then mixed with fresh Matrigel. Drops of Matrigel- 
organoid mix were placed on the bottom of tissue culture dish, let to solidify, and 
overlaid with ERM culture medium. Reg4—dsRed organoids were derived from the 
Reg4—dsRed mouse as described’. Briefly, small intestinal crypts were isolated, 
pelleted and mixed with Matrigel. Crypt-Matrigel mix was plated and cultured in 
the ENR medium. Newly generated organoids were expanded in culture for at least 
three weeks before harvesting for the experiment. The ENR medium is an 
Advanced DMEM/F12 medium, supplemented with penicillin/streptomycin, 
1X GlutaMAX, 10 mmol 1~' HEPES, 1XB27 (Life Technologies), Noggin (condi- 
tioned medium, 10% volume), R-spondin 1 (conditioned medium, 10% volume), 
1 mmol |”! N-acetylcysteine (Sigma) and 50ngml~' recombinant mouse EGF 
(Peprotech). The R-spol 1 and Noggin conditioned media were generated in 
HEK293T cells, stably expressing HA-mouse Rspol-Fc (gift from Calvin Kuo, 
Stanford University), or transiently transfected with mouse Noggin-Fc plasmid. 
Advanced DMEM/F12 with penicillin/streptomycin, 10 mmol 1~' HEPES, and 
1X GlutaMAX was conditioned for 1 week. 

Single-cell suspension preparation, FACS sorting and RNA extraction. 
Organoids were dissociated to single cells as previously”, with a few modifications. 
Briefly, medium was removed from organoid cultures, organoids were resus- 
pended in TryplE, incubated at 37 °C for 5-15 min, with passaging through a glass 
pipette each 5 min and microscopic monitoring. Upon are disruption of most of 
the cell aggregates, cells were pelleted by centrifugation (5 min at 300-400g), 
washed twice with Advanced DMEM/F12 with 10% fetal calf serum, resuspended 
in the same medium. Cells were strained through 20 um mesh filter and stained 
with either propidium iodide (Villin-Cre cells), or DAPI (Reg4—dsRed cells). 
Single cells were then sorted by flow cytometry (FACS Aria, BD). Each cell was 


sorted directly into single wells of 96-well PCR plates, each well containing 100 ul 
TRIzol reagent. Identical quantity of the ERCC Spike-in RNA (0.03 ul of 1:50,000 
dilution) was added to each well. Total RNA was extracted from each cell, accord- 
ing to the TRIZOL manufacturer’s protocol with a few alterations. To facilitate 
RNA precipitation, 0.2 ,1l of GlycoBlue reagent (Life Technologies) was added to 
each sample. Isopropanol precipitation was carried out overnight. RNA pellets 
were air dried for up to 15 min, then resuspended in the CEL-Seq first-strand 
primer solution. 

Control library with mouse embryonic stem cells and fibroblasts. Irradiated 
mouse embryonic fibroblasts were cultured in DMEM containing 10% FBS 
(Gibco), 2mM GlutaMAX (Gibco), 0.1mM MEM nonessential amino acids 
and 1% Pen/Strep (Gibco). Wild-type mouse embryonic stem cells were derived 
from C57BL6 mice and cultured in DMEM containing 10% FBS (Gibco), 
2mM GlutaMAX (Gibco), 0.1mM MEM nonessential amino acids, 1% Pen/ 
Strep (Gibco) and 1,000 U LIF ml | (ESGRO). The control library contained 5 
ESCs with barcodes 1-5, 5 MEFs with barcodes 6-10, 75 mouse small intestinal 
organoids cells with barcodes 11-85, 6 controls without template but with reverse 
transcription primer, barcodes 87-92 and 5 empty controls. 

Pool-and-split control sample preparation. Organoids, derived from the Reg4- 
dsRed knock-in mouse, were dissociated to a single-cell suspension and dsRed- 
positive cells were sorted using FACS, as described above. Pools of 100 cells were 
collected into single tubes containing 100 pl TRIzol reagent and processed for the 
total RNA extraction. Total RNA from 100 cells was resuspended in nuclease-free 
water and ERCC Spike-in RNA was added to the RNA solution (3 il of 1:50,000 
dilution, per 100 pooled cells). Then each of the pooled samples was split into 100 
separate portions. CEL-Seq was performed on 200 resulting split samples. 
Tissue preparation and immunofluorescence. Freshly dissected mouse small 
intestines were flushed and fixed in cold 4% paraformaldehyde in PBS for 3h. 
After fixation, the intestines were incubated in cryoprotective solution (30% suc- 
rose in PBS) at 4°C overnight, then frozen blocks were prepared in Tissue-Tek 
OC. compound (VWR) and stored at — 80 °C. Five 1m thick cryosections were cut 
and mounted on poly-t-lysine (Sigma)-coated cover glass. Sections were fixed in 
4% paraformaldehyde for 15 min, permeabilized with 0.25% Triton X100 (Sigma) 
on PBS for 5 min and blocked for 1 h in PBS containing 0.2% Triton X100, 1% BSA 
(Sigma), and 2% each normal donkey (Jackson) and goat (Monosan) serum. Next, 
the sections were incubated for 1 h with primary antibody solution, (in PBS with 
1% BSA), washed, and incubated for 1 h with secondary antibody (in PBS). Nuclei 
counterstain was done with DAPI (100 ng ml! in PBS) for 10 min. The sections 
were then washed and mounted in Fluoromount-G (Electron Microscopy 
Sciences). Imaging was done on Leica fluorescence microscope with a 100% oil 
immersion objective, using MetaMorph imaging software. Images were processed 
and combined using Image] and Photoshop programs. 

Antibodies. For indirect immunofluorescence the following antibodies were used: 
anti-mouse Urocortin 3 rabbit polyclonal antibody (Yanaihara Institute, Y364); 
anti-mouse CCK rabbit polyclonal antibody (LifeSpan BioSciences, aa26-33, 
LS-C190673); anti-mouse VGF rabbit polyclonal antibody (Abcam, ab69989); 
anti-mouse Tacl guinea pig polyclonal antibody (Abcam, ab10353). The following 
secondary antibodies were used: Goat anti-rabbit IgG, Cy5-conjugated (Life 
Technologies, A10523); Donkey-anti guinea pig IgG, TRITC-conjugated 
(Jackson Labs, 706-025-148). For direct immunofluorescence we used the 
mouse monoclonal antibody against AFP, conjugated to Alexa Fluor 594 (Cell 
Signaling, 7877). 

CEL-seq library preparation. Single cells were processed using the previously 
described CEL-seq technique™’, with several modifications. A 4-bp random bar- 
code as unique molecular identifier (UMI) was added to the primer in between the 
cell specific barcode and the poly T stretch (Supplementary Table 3). Dried RNA, 
prepared from single cells by TRIzol extraction method, was resuspended in 
primer solution, denatured at 70°C for 2 min and quickly chilled, after which 
the first strand synthesis mix was added. The rest of the protocol was carried 
out as published’’, with no substantial alterations. Libraries were sequenced on 
an Illumina HighSeq 2500 using 50 bp paired end sequencing. 

Single-molecule FISH. Probe libraries were designed and fluorescently labelled as 
previously described”. All probe libraries consist of 20 to 39 oligonucleotides of 
20-bp length (see Supplementary Table 3 for probe sequences) complementary to 
the coding sequence of the genes. Cells were hybridized overnight with probes at 
30 °C, as previously described”. DAPI and phalloidin-AlexaFluor488 staining was 
done after washes. Images were acquired on a Perking-Elmer Spinning Disc con- 
focal microscope with a 100X oil-immersion objective (numerical aperture 1.4) 
using Perking Elmer Volocity software. Images were recorded as stacks with a z 
spacing of 0.3 1m. Diffraction-limited dots corresponding to single mRNA mole- 
cules were automatically detected using custom Matlab software, based on prev- 
iously described algorithms”. Briefly, the images were first filtered using a three- 
dimensional Laplacian of Gaussian filter, followed by selection of the intensity 
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threshold at which the number of connected components was least sensitive to the 
threshold. 

Quantification of transcript abundance. Paired end reads obtained by CEL-seq 
were aligned to the transcriptome using bwa” (version 0.6.2-r126) with default 
parameters. The transcriptome contained all RefSeq gene models based on the 
mouse genome release mm10 downloaded from the UCSC genome browser” and 
contained 31,109 isoforms derived from 23,480 gene loci. All isoforms of the same 
gene were merged to a single gene locus. The right mate of each read pair was 
mapped to the ensemble of all gene loci and to the set of 92 ERCC spike-ins*' in 
sense direction. Reads mapping to multiple loci were discarded. The left read 
contains the barcode information: the first eight bases correspond to the cell 
specific barcode followed by 4 bases representing the unique molecular identifier 
(Supplementary Table 3). The remainder of the left read contains a polyT stretch 
followed by few (<15) transcript-derived bases. The left read was not used for 
quantification. For each cell barcode we counted the number of unique molecular 
identifiers for every transcript and aggregated this number across all transcripts 
derived from the same gene locus. Based on binomial statistics we converted the 
number of observed unique molecular identifiers into transcript counts". 

Rare cell type identification algorithm RaceID. Data preparation. The cluster- 
ing algorithm takes as input a matrix with transcript counts for all genes in each 
cell. As a first preprocessing step cells with low overall transcript counts are 
removed. We require at least 3,000 transcripts per cell for the whole organoid data 
and 1,000 transcripts per cell for the Reg4-positive cells. For the latter we observed 
overall lower transcript numbers. Next, the total transcript count within each cell is 
normalized to the median transcript number across cells. Alternatively, down- 
sampling of the transcript pool to the required minimal total transcript count can 
be applied. Hereafter, we add a pseudocount of 0.1 to the expression value of each 
gene to avoid divergences when computing fold changes. In the next step lowly 
expressed genes are filtered out. Genes that are not expressed with a minimum of 
five transcripts for the whole organoids data and three transcripts for the Reg4- 
positive cells in at least a single cell are discarded. Furthermore, highly expressed 
genes that saturate the pool of available UMIs (>500 transcripts after normaliza- 
tion for the whole organoid data and >2,000 transcripts after normalization for 
the Reg4-positive cells) are discarded, since these genes potentially introduce 
artefacts in the clustering. For the Reg4-positive cells we amended this last filtering 
step, since hormones crucial for the cell type determination saturated the UMIs in 
only very few cells. We tested the robustness of the RaceID predictions using the 
more relaxed setting also for the random organoid cells (Extended Data Fig. 5e) 
and the more stringent settings for the Reg4-positive cells (Extended Data Fig. 6d), 
respectively. In each case, we observed the same rare cell types for the cells that 
survive the filtering criteria of both settings. We also analysed Lgr5-positive intest- 
inal cells. Here we applied the same filtering criteria as used for the whole organoid 
data, since the single cell sequencing yielded a high number of transcripts per cell. 
For all ex vivo isolated cells we also applied these settings. However, the data were 
downsampled to the same transcript number in all cells (that is, from all cells 
subsets of transcripts are sampled with a size corresponding to the minimal total 
transcript count across all cells surviving the filtering step). This approach was 
applied since batch effects due to combining different libraries were more pro- 
nounced and downsampling reduces technical noise caused by variation in library 
complexity. 

k-means clustering. The clustering step of RaceID identifies larger clusters of 
different cells by k-means clustering. First, a similarity matrix is computed that 
contains Pearson’s correlation coefficients for all pairs of cells. Subtracting the 
coefficients from one yields a distance matrix, which serves as input for k-means 
clustering. k-means clustering is applied to the similarity matrix using the 
Euclidean metric. In comparison to direct clustering of the expression matrix this 
approach yielded improved cluster separation. The number of clusters used for 
k-means clustering is determined from the gap statistic’’, that is, from the differ- 
ence of the average within cluster dispersion in uniformly distributed and in the 
actual data. By default, the cluster number is determined as the first local max- 
imum of the gap statistic where the maximum exceeds its neighbours by >25% of 
their standard deviation. If the gap statistic does not exhibit a clear maximum, the 
cluster number demarcating the point where the gap statistic starts to saturate 
should be used as input for the k-means clustering. Given the number of clusters, 
k-means clustering of the distance matrix is performed and cluster reproducibility 
is assessed by bootstrapping using the clusterboot function of the R package fpc. 
The algorithm computes Jaccard’s similarity to quantify cluster reproducibility. If 
more than a single cluster has a Jaccard’s similarity lower than 0.5, the clustering 
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should be repeated with fewer clusters. Importantly, the outlier identification step 
of the algorithm will correct for an underestimation of the actual cluster number 
and it is thus recommended to start with a conservative estimate. 

Identification of outlier cells. To identify outlier cells within each cluster the algo- 
rithm evaluates transcript count variability of every gene across all cells in this 
cluster. The expected baseline level of expression variability, quantified by the 
transcript count variance, is inferred from the ensemble of all cells. A second order 
polynomial is fitted to the transcript count variance as a function of the average 
transcript count in logarithmic space (Extended Data Fig. 4a). In comparison to a 
linear regression the polynomial leads to a significant improvement of the regres- 
sion as was tested by ANOVA model comparison for the data sets presented here 
(ANOVA P< 2.2*10 '° for both data sets). For the random organoid cells the 
residual sum of squares was reduced by 20% when using the second order poly- 
nomial instead of the linear regression. This polynomial serves as an estimate of 
the expected variance-mean dependence under the assumption, that the majority 
of genes do not exhibit cluster (or cell type) specific expression. Next, each cluster 
is screened for outlier cells by computing the transcript count probability in each 
cell for a given gene from a negative binomial distribution defined by the average 
transcript count of this gene across all cells in the cluster and the expected variance 
computed by the second order polynomial. Assuming a lower limit of Poissonian 
noise, values of the expected variance lower than the mean are replaced by the 
mean. In practice, this does not happen and the regression yields noise estimates 
well above the Poissonian limit for all data sets analysed so far. If the multiple 
testing (Benjamini-Hochberg) corrected transcript count probability of a specified 
number of genes (two for our data) is below a defined probability threshold 
(<10 * for our data) in a given cell, this cell is considered an outlier. The total 
number of outliers can be plotted as a function of the probability threshold 
(Extended Data Fig. 4b), which should be chosen such that it separates the tail 
of this distribution from the bulk behaviour. 

Inference of final clusters. Given the set of outlier cells the final clusters that should 
largely correspond to different cell-types or -states are inferred. To this end, outlier 
cells are first merged to outlier clusters if their transcriptome correlation exceeds 
the 75%-quantile of the distribution of cell-to-cell correlation within the original 
clusters after outlier removal. Subsequently, new cluster centres are computed for 
the remaining original and the new outlier clusters by averaging transcript counts 
within these clusters and each cell is reassigned to the most highly correlated 
cluster centre. 

Two-dimensional representation of cell type maps. For visual inspection of cell 
clusters and associated cell types we apply a dimensional reduction of cell-to-cell 
distances as computed by the distance matrix (see above) using a machine learning 
algorithm termed t-distributed stochastic neighbour embedding (t-SNE)"*. Briefly, 
this algorithm converts the original point-to-point distance distribution to a lower 
dimensional Student’s t-distribution. The location of all points in the map is 
determined by a stochastic minimization of the Kullback-Leibler divergence of 
the original distances with respect to the mapped distances. 

Identification of differentially expressed genes. To identify genes that were on 
average up- or downregulated within a cluster compared to the ensemble of all 
cells, the fold change in absolute transcript counts was computed after normalizing 
the total transcript count in a cell to the median transcript count within the cluster 
under consideration. As shown previously"* for single cell sequencing data, med- 
ian normalized transcript levels exhibit Poissonian noise for most genes with small 
deviations at high expression. A P value for significant up- or downregulation was 
therefore computed based on Poissonian statistics and multiple testing corrected 
by the Benjamini-Hochberg method. 

Code availability. The RaceID algorithm is supplied as an R script (Supplemen- 
tary Data 1) along with sample code (Supplementary Data 2), an extensive ref- 
erence manual (Supplementary Data 3) and sample data (Supplementary Table 6), 
corresponding to the random organoid cell transcriptome data analysed in 
this paper. Bug fixes and updates of RaceID can be downloaded from https:// 
github.com/dgrun/RaceID. 
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Extended Data Figure 1 | Sequencing statistics. a, Histogram of the number __ of the total number of reads per cell divided by the total number of sequenced 
of sequenced transcripts per cell. The median (red line) is 8,559. The 288 cells __ transcripts as counted with unique molecular identifiers. The average level of 
were sequenced on two lanes to a total depth of 106,950,038 reads. Of those, oversequencing across all genes is 6.9-fold (red line). 

32,694,069 (31%) were mapped with a valid cell-specific barcode. b, Histogram 
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Extended Data Figure 2 | Clustering reveals major transcriptome 
differences between intestinal cells. a, Dendrogram obtained by hierarchical 
clustering of the transcriptome correlation matrix of 238 intestinal cells that 
survived all filtering steps using Euclidean distance metric. At least three 
distinct groups of cells can be recognized. b, Gap statistic of k-means clustering 
of the correlation matrix as a function of the cluster number. The gap statistic 
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Gap 


reflects the difference of the average within cluster distance of points in 
uniformly distributed data and the actual data. The first local maximum 
provides a good estimate for the number of clusters that achieves optimal 
separation of the data into clusters’. Data points and error bars represent mean 
and standard deviation across 50 bootstrap samples. For the intestinal cells a 
number of six clusters was predicted on the basis of the gap statistic. 
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enteroendocrine marker Chga (b), the goblet cell marker Muc2 (c), the Paneth __ transcript counts of Chga on a logarithmic scale, since the dynamic range of 
cell marker Lyz1 (d), for transcript counts aggregated across all ribosomal genes _ this gene was very large. Dim, dimension. 
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Extended Data Figure 4 | Identification of rare cell types with RaceID. 

a, Variance of transcript count across the entire ensemble of sequenced 
intestinal cells as a function of mean expression. The red line represents a 
second order polynomial (upper left corner) that was fitted to the data. 
Assuming that a large number of genes follows a similar transcript count 
distribution across different cell types this function can be used to estimate the 
parameters for a negative binomial that represents a background model for 
the expected transcript count variability at a given mean expression. The 
probability of observing a given transcript count in a particular cell of a cluster 
can be computed using this distribution with the average expression within 
this cluster as input. If the expression of at least two genes has a probability 
<10 “* after multiple testing correction a cell is considered an outlier. 

b, Number of outlier cells as a function of the probability threshold. The 


threshold used in this study ( 10 *) is indicated (red broken line). c, d, RaceID of 
pool-and-split controls reflects a low false-positive rate (see Supplementary 
Note). ¢, t-SNE map of 93 pool-and-split controls. RaceID identifies only a 
single large cluster with no outliers. d, Outlier probability for all pool-and- 
split controls. The p-value of all cells is higher than the default threshold 

for outlier identification (10~*). e, RaceID on a defined mixture of cells 
demonstrates high specificity (see Supplementary Note). RaceID clusters for a 
mixture of cells comprising 75 random organoid cells, 5 mouse embryonic 
stem cells (mESCs) and 5 mouse embryonic fibroblasts (MEFs). Two out of five 
MEFs did not pass the filtering criteria (>3,000 transcripts per cell). EC, 
enterocytes; ECpr, enterocyte precursors; TA, transit amplifying cells; EE, 
enteroendocrine cells; EEpr, enteroendocrine precursors; P, Paneth cells. Dim, 
dimension. 
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Extended Data Figure 5 | Benchmarking RaceID. a, b, To benchmark the 
RaceID algorithm we compared to a previously published method developed to 
distinguish cell types from the spleen® in thousands of sequenced single cells. 
We implemented the method according to the specification provided in the 
original paper. A shortcoming of the method is that it has to be initialized 
with an expected number K of cell types. Running the algorithm with K = 6 
(a) yields results very similar to k-means clustering with K = 6 (Fig. 1c). 
However, when the algorithm is run with a larger number of cell types, 

e. g. K = 20 (b), rare cells of the secretory lineage can still not be separated while 
clusters corresponding to relatively uniform cell types fall apart. We conclude 
that this algorithm performs well for more abundant cell types but cannot 
identify rare cell types. c, Principal component analysis (PCA) of the 
transcriptome similarities. The cell types identified by RaceID are highlighted. 
The first two principal components can only classify major groups of 
enterocytes, transit amplifying cells, and secretory cells. d, To demonstrate that 
our method is not affected by technical noise due to varying detection efficiency 
across individual cells, we downsampled the transcriptome of all cells with 
>3,000 transcripts to the same size, given by the minimum total transcript 


counts across all cells that passed the filtering, and repeated the outlier 
identification. The t-SNE map shows all cell types identified by this strategy and 
the results are highly consistent with the normalization-based approach. e, The 
t-SNE map shows the results of RaceID run with relaxed filtering constraints 
(>1,000 transcripts per cell and only genes with more than three transcripts 
in at least one cell) as used for the Reg4-positive organoid cells. All the rare 
secretory cell types identified with the original settings were recovered. The 
different stages of enterocyte differentiation are also apparent. EC, enterocytes; 
EClpr, late enterocyte precursors; ECepr, early enterocyte precursors; TA, 
transit amplifying cells; G, goblet cells; EE, enteroendocrine cells; EElpr, late 
enteroendocrine precursors; EEepr, early enteroendocrine precursors; P, 
Paneth cells; T, tuft cells. f, Same as Extended Data Fig. 4a, but cell cycle related 
genes are highlighted as blue circles. This set of genes comprises all genes 
containing “cell cycle” within their associated “biological process” Gene 
Ontology terms. Cell cycle related genes do not show increased variability and 
are thus unlikely to lead to false positives in the outlier detection by RaceID. 
Dim, dimension. 
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Extended Data Figure 6 | Purification of Reg4-positive cells from reporter 
mouse organoids. In total, 288 cells were sequenced. Ninety-six and 192 
cells were analysed from two separate sorting experiments. a, Single small 
intestine cells derived from the wild-type (WT, upper panel) and Reg4—dsRed 
(lower panel) mice were sorted by FACS. Live cells were gated as DsRed- 
positive (lower panel, gate denoted by black rectangle, DsRed-High). SSC-W, 
side scatter width, DsRed-H, DsRed height. A median number of 2,813 
transcripts per cell were quantified. After filtering out cells with <1,000 
transcripts and genes with <3 transcripts in all cells or >2,000 transcripts in a 
single cells, 161 cells remained for analysis. b, c, In the t-SNE maps of Reg4- 
positive cells the transcript count of dsRed, driven by a Reg4 promoter (b) and 
endogenous Reg4 (c) are colour-coded ona logarithmic (log) scale. Assuming a 
previously estimated sensitivity of our sequencing protocol’, we measure 
~10% of all expressed transcripts. Reporter expression is about eightfold 
reduced in comparison to endogenous Reg4, but expression of both the reporter 
gene and Reg4 is strongest in the Tacl/Tph1 expressing enteroendocrine cells, 
while expression in Paneth-like cells is reduced. Noticeably, expression of Reg4 


and the reporter gene is also reduced in the Cck-positive enteroendocrine cells, 
similar to Chga. d, The t-SNE map shows the results of RaceID with more 
stringent filtering constraints (>3,000 transcripts per cell and only genes with a 
minimum of five transcripts in at least one cell) as used for the random 
organoid cells. Although the total number of cells is reduced to 135, most 
subtypes and rare cells identified with the relaxed settings are still observed, 
including the Afp and Alb expressing sub-types, the Ucn3-positive cells, the 
Cck-positive cells, the contamination by enterocytes and transit amplifying cells 
as well as the different Paneth cell states. e-h, Marker gene expression reveals 
intestinal cell types among Reg4-positive cells. In the t-SNE maps of Reg4- 
positive cells the transcript count of known marker genes is colour-coded on a 
logarithmic (log2) scale. Shown are maps, for transcript counts aggregated 
across all defensin genes which are highly expressed in Paneth cells (e), for 
transcript counts aggregated across all ribosomal genes (f), for the enterocyte 
marker Apoal (g) and for the enteroendocrine markers Chga (h). Dim, 
dimension. 
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positive cells. In the t-SNE maps of Reg4-positive cells the transcript count of —_(b), secretein (Sct) (c), neurotensin (Nts) (d), proglucagon (Gcg) (e), and 
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Extended Data Figure 8 | Heterogeneity of substance P producing 
enteroendocrine cells. In the t-SNE maps of Reg4-positive cells the transcript 
count of genes specifically expressed in subtypes of enteroendocrine cells is 
colour-coded on a logarithmic (log,) scale. Shown are maps for tachykinin 
(Tac1), which encodes substance P, (a), tryptophan hydroxylase 1 (Tph1) 

(b), urocortin 3 (Ucn3) (c), alpha-fetoprotein (Afp) (d), and VGF nerve growth 
factor inducible (Vgf) (e). f, The heat map shows the average expression of 
inferred marker genes for the enterochromaffin subtypes (cluster 2, 3 and 7). 
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To show all genes on the same scale, the sum of average expression levels in each 
of the three clusters was normalized to one. Expression levels are shown for six 
cluster 2 markers and for six cluster 7 markers. Cluster 3 is distinct by the 
downregulation of both sets. Cluster 5 (not shown here) does not have specific 
markers and differs from the other clusters by lower expression of mature 
enterochromaffin markers (Chga, Chgb, Tacl, Tph1). This cluster likely 
comprises non-mature enterochromaffin cells. Dim, dimension. 
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Extended Data Figure 9 | Single-molecule FISH and immunofluorescence 
experiments confirm expression of markers for enteroendocrine cell sub- 
populations in the mouse small intestine. a, b, Small intestine cryosections 
were hybridized with smFISH probe libraries. Scale bar, 10 pm. a, The novel 
marker Ucn3 was found in a small number of Tac1-positive cells. Probes 
against Tac1, conjugated with TMR (upper panel, red), and against Ucn3, 
conjugated with Cy5 (lower panel, green), were hybridized to the tissue 
sections. Dashed lines indicate cell borders. A cell co-expressing the two 
markers (Tac1*/Ucn3*) is shown in the left column. A cell expressing Tacl, 
but not Ucn3 (Tac1*/Ucn3°), is shown in the right column. b, The novel 
marker Vgf is expressed by a subpopulation of Tph1-positive cells. Probes 
against Tph1, conjugated to TMR (upper panel, red), and against Vegf, 
conjugated to Cy5 (lower panel, green), were used for hybridization. Cell 
borders were stained by phalloidin-AlexaFluor 488 (not shown). Dashed lines 
demarcate cell borders. A Tph1-positive cell, expressing Vef (Tph1*/Vef*) is 
shown in the left column. An example of a Tph1-positive cell, expressing no Vgf 
(Tph*/Vgf_) is shown in the right column. c, d, Immunostaining was 
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performed on cryosections of mouse small intestinal tissue. Scale bar, 20 um. 
c, Expression of UCN3 was observed in a few TAC1-positive cells within the 
jejunum. Frozen tissue sections were indirectly stained with anti- UCN3 

(left panel, red), and anti-TAC1 (middle panel, light blue) antibodies. Nuclei 
were visualized with DAPI (dark blue). A cell, expressing both markers 
(TAC1*/UCN3*), is shown in the upper row. A cell, positive for TAC1, but not 
UCN3 (TACI1*/ UCN3_), is shown in the lower row. The arrowhead points 
at the UCN3-negative cell. d, VGF is expressed by a subpopulation of TAC1- 
positive jejunal and ileal cells. VGF (left panel, red) and TAC] (second panel 
from the left, grey) expression was visualized with indirect immunostaining. 
Expression of AFP was detected using a directly conjugated antibody (second 
panel from the right, green). Nuclei were counterstained with DAPI (blue). A 
TACI1-positive cell, expressing VGF and AFP (TAC1*/VGE*/AFP*) is shown 
in the upper panel. An example of a TAC1-positive cell, expressing no VGF or 
AFP (TAC1*/VGE /AFP_ ) is shown in the lower panel. Arrowheads point at 
the VGF- and AFP-negative cell. 
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Extended Data Figure 10 | Purification of Lgr5-GFP* cells from reporter 
mouse” organoids. Single small intestinal organoid cells, derived from Lgr5- 
GFP mice were sorted by FACS. In total, 96 cells were sequenced from a single 
experiment on four lanes with 31,065,854 reads in total of which 33% could be 
mapped to the transcriptome. Every UMI-derived transcript was sequenced on 
average 6.4 times. A median number of 9,626 transcripts per cell were 
quantified. After filtering out cells with <3,000 transcripts and genes with <5 
transcripts in all cells or >500 transcripts in a single cell, 74 cells remained for 
analysis. a, The t-SNE map shows the cell types identified by RaceID. Only a 
single predominant cell type and few outliers were observed. Cluster 1 
comprises intestinal stem cells while the few outliers represent Paneth and 
enteroendocrine cells. b-d, The t-SNE maps show expression of the stem cell 
marker Lgr5 (b), the stem cell marker Lrig1 (c), and the +4 niche marker Hopx 
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(d). All markers are homogenously expressed across all cells at low transcript 
counts. We only observed marginal expression of the stem cell marker Bmi1 in 
few cells and we did not observe expression of Tert in any of the cells, which is 
likely owing to the limited sensitivity of the method. The RaceID results 
indicate that Lgr5-positive intestinal stem cells represent a uniform population 
of cells. e, f, Combined analysis of random organoid and Lgr5-positive cells 
using RaceID. The initial clusters of the random organoid cells are conserved. 
The Lgr5-positive cells give rise to a uniform group that merges with the CBC/ 
TA cluster from the random organoid cells (cluster 2). Shown is a heat map 
representation (e) and a t-SNE map (f) of the cell-to-cell transcriptome 
distances. Clusters are indicated by the same colours along the axes of the heat 
map (e) and for individual data points in the t-SNE map (f). Dim, dimension. 
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Distinct EMT programs control normal mammary 
stem cells and tumour- initiating cells 


Xin Ye!, Wai Leong Tam”, Tsukasa Shibue', Yasemin Kaygusuz', Ferenc Reinhardt", Elinor Ng Eaton’ & Robert A. Weinberg'*” 


Tumour-initiating cells (TICs) are responsible for metastatic 
dissemination and clinical relapse in a variety of cancers’. 
Analogies between TICs and normal tissue stem cells have led to 
the proposal that activation of the normal stem-cell program within 
a tissue serves as the major mechanism for generating TICs*’. 
Supporting this notion, we and others previously established that 
the Slug epithelial-to-mesenchymal transition-inducing transcrip- 
tion factor (EMT-TF), a member of the Snail family, serves as a 
master regulator of the gland-reconstituting activity of normal 
mammary stem cells, and that forced expression of Slug in collab- 
oration with Sox9 in breast cancer cells can efficiently induce 
entrance into the TIC state*. However, these earlier studies focused 
on xenograft models with cultured cell lines and involved ectopic 
expression of EMT-TFs, often at non-physiological levels. Using 
genetically engineered knock-in reporter mouse lines, here we show 
that normal gland-reconstituting mammary stem cells”""' residing in 
the basal layer of the mammary epithelium and breast TICs origin- 
ating in the luminal layer exploit the paralogous EMT-TFs Slug and 
Snail, respectively, which induce distinct EMT programs. Broadly, 
our findings suggest that the seemingly similar stem-cell programs 
operating in TICs and normal stem cells of the corresponding nor- 
mal tissue are likely to differ significantly in their details. 

To define the functions of endogenously encoded, physiologically 
regulated Snail family EMT-TFs in breast cancer pathogenesis in vivo, 
we generated knock-in IRES (internal ribosomal entry site)-YFP 
(yellow fluorescent protein) reporters for Slug (also called Snai2) and 
Snail (also called Snai1) (Fig. 1a, b). These knock-in reporters faithfully 
reflected the expression of the endogenous genes (Extended Data 
Fig. la, b), and enabled the isolation of Slug* or Snail* cells by fluor- 
escence-activated cell sorting (FACS) (Extended Data Fig. 6e-h). 

Using these reporters, we found that Slug was expressed at higher 
levels in the normal mammary stem cell (MaSC)-enriched basal mam- 
mary epithelial cells (MECs) compared to the stromal fibroblasts sur- 
rounding the mammary ducts. In contrast, the EMT-TFs Snail, Twist 
and Zeb1 were expressed in stromal fibroblasts but not in either basal 
or luminal MECs (Fig. 1c-e, g and Extended Data Fig. 1c-f). In addi- 
tion to the differential expression of EMT-TFs, the MaSC-enriched 
basal MECs displayed intermediate expression levels of both epithelial 
and mesenchymal markers (Fig. 1f, g and Extended Data Fig. 1g). 
Hence, Slug expression in the normal basal MECs was associated with 
only a partial conversion to the mesenchymal state. 

Given the differential expression patterns of Slug and Snail, we 
undertook to analyse their expression during tumour development 
using the MMTV-PyMT transgenic model of mammary tumour 
formation, which mirrors the multi-step progression of human breast 
cancers beginning from hyperplastic lesions to high-grade carcinomas 
that spontaneously metastasize to the lungs’. In the initially formed 
hyperplastic lesions, we noted a marked reduction of the relative fre- 
quency of Slug-YFP* cells compared to normal glands, contrary to the 
hypothesis that activation of the Slug EMT-TF might be the preferred 


mechanism to generate TICs. These Slug-YFP* cells were cytokera- 
tin14* (CK14) (Fig. 2a and Extended Data Fig. 2f), indicating that Slug 
expression was still confined to cells of the basal lineage, as was the case 
within the normal ducts. In these early-stage lesions, we detected for 
the first time Snail-YFP expression in a small fraction of the neoplastic 
cells displaying CK8*Slug Zeb1~ luminal characteristics (Fig. 2a, b 
and Extended Data Fig. 2a-c). 

As these early-stage tumours progressed to high-grade carcinomas, 
the Slug* cells remained largely confined to the basal sectors of each 
epithelial island, whereas the Snail* cancer cells were sometimes 
fully detached from the epithelial islands and exhibited an elongated 
mesenchymal morphology (Fig. 2c). We found that virtually all Snail- 
YFP* tumour cells had lost E-cadherin expression and activated 
expression of the Zeb1 EMT-TF; in contrast, most Slug-YFP* tumour 
cells retained junctional E-cadherin and lacked Zeb1 expression 
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Figure 1 | Differential expression of Slug and Snail in normal mammary 
glands. a, b, Targeting strategies for the knock-in alleles. c, d, Normal 
mammary glands of the indicated genotypes were stained for the indicated 
proteins. e, FACS histograms showing relative expression levels of the YFP 
reporters in normal adult mammary cell subpopulations. f, Normal mammary 
gland stained for E-cadherin (E-cad) and Slug. Arrowheads indicate the 
junctions between basal MECs. Right panel: quantifications of Anti-E-cadherin 
staining intensities at the junctions between luminal MECs and basal MECs 
in a representative mammary gland (mean ~ s.d., n = 20, cell junctions, 

*P < 0.00001). Data represent analyses of six glands. g, Representative 
qRT-PCR quantification of the indicated EMT markers (mean + s.e.m., 
technical triplicates). Levels in luminal MECs were set to one. Data represent 
three independent experiments. All scale bars indicate 20 um. 
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Figure 2 | Differential expression of Slug and Snail in mammary tumours. 
a, b, Hyperplastic mammary lesions of the indicated genotypes were stained 
for the indicated proteins. Arrow in a indicates Snail-YFP and CK8 double- 
positive cells. Arrows and arrowheads in b indicate Snail-YFP and cytokeratin 
double-positive cells, and Slug-positive cells, respectively. c, d, High-grade 
carcinomas of the indicated genotypes were stained for the indicated proteins. 
Arrows indicate Zeb1 and cytokeratin double-positive cells (c) and the junc- 
tions between YFP-positive carcinoma cells (d). e, Snail *”’* ;MM TV-PyMT 
tumours were stained for the indicated proteins. Arrows indicate Snail-YFP- 
positive carcinoma cells. f, Tumour organoids of the indicated genotypes were 
stained for the indicated proteins. Images represent three independent 
experiments. All scale bars indicate 10 um. 


(Fig. 2c, d and Extended Data Fig. 2d). Therefore, Snail rather than 
Slug is associated with more complete expression of mesenchymal 
traits in mammary tumours. 

Notably, as tumours progressed, we noted that the Snail-YFP* cells 
gradually acquired basal CK14 expression and lost luminal CK8 
expression (Fig. 2e and Extended Data Fig. 2c, e), consistent with 
the proposal that in human breast carcinomas, aggressive cancer cells 
exhibiting basal features can arise from luminal precursors'*"'’. To 
compare the activation of Slug and Snail during such luminal-basal 
transitions, we used an organoid culture system in which CK14 is 
spontaneously activated as the tumour cells invade into a type I 
collagen gel’*. We dissociated adenocarcinomas into tumour organoids 
as previously described’*. These freshly isolated tumour organoids were 
almost exclusively of luminal phenotype (CK8*CK14 ) and lacked 
both Slug-YFP and Snail-YFP expression (Extended Data Fig. 3a). 
After 48h in culture, CK14 expression was induced in tumour cells 
at the invasive fronts of these organoids. Strikingly, this induction was 
tightly associated with Snail-YFP but not Slug-YFP activation (Fig. 2f 
and Extended Data Fig. 3b, c). 

Taken together, these analyses indicate that the EMT-TF that is 
activated in the MMTV-PyMT mammary tumours (that is, Snail) is 
distinct from the one expressed in the normal gland-reconstituting 
MaSCs (that is, Slug), and that expression of Snail rather than its 
paralogue Slug is associated with potent EMT activation and eventual 
acquisition of basal features (Extended Data Fig. 3d). 

To extend and generalize these observations further, we compared 
Slug, Snail and Zeb1 expression patterns in the MMTV-Neu’* and 
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BRCA1/p53-minus’” transgenic models of breast cancer development. 
Snail activation had previously been associated with recurrence and 
metastasis formation in the MMTV-Neu model””'. Consistently, we 
found that Snail and Zebl are activated in ~1-2% of tumour 
cells within MMTV-Neu tumours that are associated with metastatic 
disease, which contained no evidence of Slug expression (Extended 
Data Fig. 4a-c). The BRCA1/p53-minus tumours were highly 
heterogeneous and harboured areas with epithelial and glandular 
phenotypes as well as areas with mesenchymal and invasive pheno- 
types (Extended Data Fig. 4d, g). We found that Slug was predomi- 
nantly expressed by cells occupying the basal sectors of the gland-like 
structures (echoing its expression in normal ducts), whereas Snail and 
Zeb1 were predominantly expressed in the invasive cells (Extended 
Data Fig. 4d-h). Hence, differential expression of Slug and Snail 
appears to be a conserved feature of mammary tumours driven by 
diverse oncogenic signals. 

We were curious whether differential expression of SLUG and 
SNAIL was also observable in human breast cancer cells, and therefore 
surveyed SLUG versus SNAIL expression in human breast cancer cell 
lines. Mirroring our observations in the mouse models, SNAIL but not 
SLUG was expressed in luminal breast cancer cells, and the two EMT- 
TFs were expressed in distinct (but occasionally overlapping) popula- 
tions in basal breast cancer cells (Extended Data Fig. 5a, b). In addition 
to these genetically unrelated breast cancer cell lines, we compared 
SLUG and SNAIL expression in an MCF10A-based model of breast 
cancer progression”. This model consists of the immortalized, 
non-tumorigenic MCF10A human MECs, MCF10A cells transformed 
with an H-RAS oncogene (MCF10A-Ras), and a cell line established 
from an MCF10A-Ras cell-derived carcinoma (MCF10A-Ras-C). We 
found that SLUG was expressed in MCF10A cells, but was downregu- 
lated in MCF10A-Ras-C cells. In contrast, SNAIL was absent in 
MCFIO0A cells, but underwent activation in MCF10A-Ras cells and 
further upregulation in MCF10A-Ras-C cells (Extended Data Fig. 5c, d). 

To summarize, these results indicated that the differential expression 
of Slug and Snail is a common feature of mammary tumours despite 
their different subtypes, genetic backgrounds and oncogenic drivers. 
Although Snail is absent in normal MECs, it often becomes activated 
during breast cancer progression. Indeed, SNAIL expression was 
detected in ~80% microdissected human invasive ductal carcinomas”. 

Our observations raised the question of whether the Slug’, Snail*, 
or yet other cell subpopulations within mammary tumours were 
enriched in TICs. To address this issue, we developed a system that 
allowed us to isolate these various subpopulations. Thus, we FACS- 
purified premalignant Ep>CAM* MECs from 4—-5-week-old Slug’”””*; 
MMTV-PyMT;RFP (red fluorescent protein) or Snail?’ sMMTV- 
PyMT;RFP animals, and thereafter implanted these cells into cleared 
mammary fat pads of hosts that lacked these transgenes (Fig. 3a). The 
implanted premalignant MECs first grew as rudimentary ductal struc- 
tures and then progressed over 6-7 months to form high-grade carci- 
nomas that metastasized to the lungs (Extended Data Fig. 6a, b). 

We FACS-resolved the RFP* carcinoma cells based on the express- 
ion levels of the YFP (Slug or Snail) reporter and the EpCAM epithelial 
marker. In the high-grade carcinomas and corresponding pulmonary 
metastases, EpCAM expression was downregulated in 4-12% of the 
carcinoma cells. These EPCAM"°” cells had low Slug-YFP expression 
(Fig. 3b and Extended Data Fig. 5c) but high Snail-YFP expression 
(Fig. 3c and Extended Data Fig. 5d). In contrast, the EpCAM™®” 
populations were Snail-YFP” (Fig. 3c), and were composed of 
Slug-YFP’°’EpCAM"£" and Slug-YFP™®"EpCAM"®" subpopulations 
(Fig. 3b). Hence, EpCAM expression was inversely correlated only 
with Snail expression. 

Using quantitative reverse transcription PCR (qRT-PCR) analyses, 
we confirmed that Snail expression was highest in the Ep>CAM'” 
subpopulations, whereas Slug was enriched in Ep>CAM"®" subpopula- 
tions (Fig. 3d, e). As expected, strong induction of mesenchymal 
markers and suppression of E-cadherin were only seen in the 
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Figure 3 | Breast TICs express Snail. a, Schematic 
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EpCAM”” subpopulations (Fig. 3d, e). Using cell lines derived from 
Slug’ *;MMTV-PyMT and Snail**’’* ;MM TV-PyMT tumours, we 
also observed the segregation of Slug and Snail expression at the pro- 
tein level and associated Snail but not Slug expression with strong 
induction of a mesenchymal phenotype (Extended Data Fig. 6e-h). 

Together, these observations demonstrated our ability to resolve 
and isolate distinct tumour cell subpopulations with high levels of either 
Slug (Slug) or Snail (Snail™®") from the same primary tumours 
(Extended Data Fig. 7a), allowing us, in turn, to directly compare their 
respective tumour-initiating activities. To this end, we FACS-purified 
these subpopulations (Extended Data Fig. 6e—h) and implanted each at 
limiting dilutions to score tumour formation. Overall, Snail?! subpo- 
pulations exhibited more than two orders of magnitude higher propor- 
tions of TICs than did the other subpopulations. In contrast, the Slug— 
YEP" cells were as deficient in tumour-initiating ability as the Slug- 
YFP'°’EpCAM" 2 cells (Extended Data Fig. 7b, c). Hence, Snail but not 
Slug was tightly associated with a TIC phenotype. 

To compare the TIC activities and metastatic powers of these 
tumour cell subpopulations coexisting in the same primary tumours 
in vivo, we FACS-purified each subpopulation from highly metastatic 
carcinomas generated by Slug’"”; MMTV-PyMT;REFP cells and intro- 
duced them via the tail vein to gauge their respective abilities to seed 
pulmonary metastases. Notably, the Snail”®* cells consistently gave 
rise to far more metastatic outgrowths relative to the EPCAM"®" sub- 
populations. In particular, 40,000 Snail™2" cells from a highly meta- 
static primary tumour (Extended Data Fig. 8a) seeded on average ~90 
large metastases in each animal. In contrast, 40,000 cells of the Slug” igh 
and Slug-YFP'°’EpCAM"®" subpopulations from the same tumour 
seeded an average of only 3.6 and 2.2 metastases per animal, respect- 
ively (Fig. 3f and Extended Data Fig. 8b). The Snail"®" cells were also 
far more metastatic than the other two populations when implanted 
subcutaneously (Fig. 3g and Extended Data Fig. 8e-g). Interestingly, 
the metastatic outgrowths formed by the Snail®®> cells harboured 
gland-like structures composed of both Slug* and Slug” cells 
(Extended Data Fig. 8c, d). Hence, the Snail" cells that seeded meta- 
stases were capable of differentiating within these outgrowths, thereby 
regenerating the complex cellular hierarchy present in the original 
primary tumours. These data also revealed that the TICs did not derive 
from basal MaSC-like cells (that is, the Slugs cells) but instead arose 
in a different cell population. 
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We were curious whether correlates of these distinct behaviours of 
SLUG and SNAIL could be found in human clinical data sets, and 
therefore examined the prognostic powers of SLUG or SNAIL (ref. 
24). Across various patient populations, we found that only elevated 
expression of SNAIL consistently associated with poor survival 
(Extended Data Fig. 9a). 

These results strongly argue for divergent roles of Slug and Snail, 
and predicted that shutdown of Snail could selectively eliminate breast 
TICs. To test this notion, we knocked down either Slug or Snail in 
advanced MMTV-PyMT carcinoma-derived pBI.3G cells and MDA- 
MB-231 human breast cancer cells (Extended Data Fig. 9b, c). 
Strikingly, Snail knockdown but not Slug knockdown induced 
mesenchymal-to-epithelial transition (MET), leading to loss of Zeb1 
and reactivation of E-cadherin (Fig. 4a, b). When these cells were 
injected orthotopically into mammary fat pads, Snail but not Slug 
knockdown attenuated primary tumour growth and strongly sup- 
pressed their metastatic spreading (Fig. 4c—e). Similarly, across a panel 
of human breast cancer cell lines, we found that SNAIL knockdown 
significantly suppressed tumour initiation in most of them, while 
SLUG knockdown failed to do so (Extended Data Fig. 9d-f). In con- 
trast to the responses of breast cancer cells, the organoid-forming and 
gland-reconstituting activities of normal murine MaSCs were mark- 
edly affected by Slug knockdown but not by Snail knockdown (Fig. 4f 
and Extended Data Fig. 9g). 

Given the distinct functions exerted by Slug and Snail, we analysed the 
transcription programs controlled by these paralogous EMT-TFs using 
ChIP-seq. We focused on two MMTV-PyMT tumour cell lines that 
differed in Slug and Snail expression and tumorigenic potential 
(Extended Data Fig. 10a, b). We recovered similar numbers of chromatin 
regions that were enriched for either Slug or Snail binding (Fig. 5a). 
Across the genome, Slug- and Snail-binding sites displayed similar fold 
enrichments and were both enriched for the known Snail family recog- 
nition CANNTG E-box motif?’ (Extended Data Fig. 10c, d). We found 
that Snail occupied 10,129 promoters, far more than that occupied by 
Slug (2,475 promoters) (Fig. 5b). Interestingly, the promoters of genes 
encoding key mesenchymal markers were only bound by Snail but not 
by Slug (Extended Data Fig. 10e). Gene-set enrichment analyses (GSEA) 
confirmed that Snail-bound but not Slug-bound genes were significantly 
enriched for EMT-related signatures (Fig. 5c). In particular, Snail, but 
not Slug, occupied the promoter of Zeb1 (Fig. 5d), a master regulator of 
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TICs under a variety of settings****. The binding of Snail appears to 
activate Zeb1 expression in mammary tumour cells, because knockdown 
of Snail led to downregulation of Zeb1 (Fig. 4a, c), and ectopic expression 
of SNAIL in human MECs induced ZEB1 expression (Extended Data 
Fig. 10f). Indeed, Snail had been reported to activate Zeb1 expression in 
non-mammary types of epithelial cells*”°. 

To investigate the possible differential abilities of SLUG and SNAIL 
in controlling ZEBI expression in human breast cancer cells, we used 
ChIP-qPCR to examine SLUG and SNAIL binding at the ZEB1 pro- 
moter in MDA-MB-231 cells, which co-express SLUG and SNAIL. 


shSLUG-2  shSNAIL-1 shSNAIL-2 
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Figure 4 | Depletion of Snail selectively affects 
breast TICs. a, b, Immunofluorescence images 
of the shRNA-transduced pB1.3G cells (a) and 
MDA-MB-231 cells (b). Scale bars, 20 um. 

c, d, Primary tumour burdens and pulmonary 
metastases formed by orthotopically implanted 
pBI.3G cells (c, unilateral implantation, n = 5 
animals per group, *P = 0.026, **P = 1.9 X 10~°) 
and MDA-MB-231 cells (d, bilateral implantation, 
n= 5 animals per group, *P < 0.01, **P<0.001). 
NS, not significant. Source Data is associated with 
this figure. e, Fluorescent images of whole-mount 
lungs showing spontaneous metastases formed by 
the orthotopically implanted GFP-labelled MDA- 
MB-231 cells. f, Whole-mount fluorescent images 
of the mammary fat pads implanted with the 
indicated GFP-expressing primary murine MECs. 
Scale bar, 1 mm. 


Notably, although both SLUG and SNAIL appeared to occupy the 
ZEB1 promoter in these cells, SLUG binding (but not SLUG express- 
ion) was diminished in SNAIL knockdown cells (Fig. 5e-g). Since the 
knockdown of SNAIL, but not SLUG, resulted in downregulation of 
ZEB1 in MDA-MB-231 cells (Fig. 4c) and the binding of SLUG to the 
ZEB1 promoter is dependent on SNAIL expression, we concluded that 
ZEB1 expression was controlled by SNAIL but not SLUG in MDA- 
MB-231 cells as well. 

Our data underscore profound differences in the transcription- 
regulating activities of the endogenously encoded Slug and Snail 


Figure 5 | Slug and Snail control different 
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EMT-TFEs, and provide indications that normal stem cells and TICs of 
the same tissue-of-origin could arise from different cellular compart- 
ments and exploit different molecular signalling circuits to activate 
related but distinct signalling pathways. We have previously correlated 
high levels of SLUG expression with poor prognosis in human breast 
cancer patients, and attributed such correlation to the experimentally 
observed EMT-inducing function of SLUG*. However, on the basis of 
the present work, we propose that the prognostic power of SLUG 
expression may be due in large part to its strong association with basal 
differentiation, which is, on its own, a well-known feature of aggressive 
breast cancers’®. Nonetheless, since our in vivo analyses focused on 
MMTV promoter-driven tumours, which appear to derive primarily 
from luminal MECs, we suggest that the functions of Slug in basal 
MEC-derived tumours remain to be further characterized. 


Online Content Methods, along with any additional Extended Data display items 
and Source Data, are available in the online version of the paper; references unique 
to these sections appear only in the online paper. 
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Animals. The Slug’’” and Snail”’” alleles were generated by homologous recom- 
bination in mouse embryonic stem (ES) cells using standard gene-targeting meth- 
ods. The MMTV-Neu animals and CAG-mRFP animals were obtained from the 
Jackson Laboratory (stock numbers 005038 and 005884). The MMTV-PyMT 
animals were originally obtained from the Jackson Laboratory (stock number 
002374) and backcrossed for five generations to C57BL/6 background. Mice were 
housed and handled in accordance with protocols approved by the Animal Care 
and Use Committees of the Massachusetts Institute of Technology. Animals were 
randomized by age and weight. The investigators were blinded to allocation during 
experiments and outcome assessment for experiments shown in Fig. 3f, g and 
Extended Data Fig. 10b. 

Genotyping. PCR primers (5’ to 3’) for Slug’'” genotyping were (sense strand, 
AACCTTCTCCAGAATGTCGCTTCTG; antisense strand, T@CAGGTGTATC 
TTATACACGTGGC) and for Snail’”” genotyping were (sense strand, CTCCCG 
CATGTCCTTGCTCCACAAG; antisense strand, same as for Slug” ”) DNA 
extractions and subsequent PCR reactions were performed using the 
REDExtract-N-Amp Tissue PCR kit (Sigma XNAT). PCR was performed with 
35 cycles of denaturation at 94°C for 30 s, annealing at 60 °C for 30 s, and elonga- 
tion at 72 °C for 30s. 

Tumour categorization. MMTV-PyMT tumours were categorized into hyper- 
plastic lesions, adenocarcinomas and high-grade carcinomas following the histo- 
logical criteria described by Lin et al.'*. Briefly, in the genetic background of our 
animals, hyperplasic lesions usually occur at about 8-10 weeks of age, and consist 
of regionally packed lobules formed on the duct. We collectively refer to the 
adenoma/MIN and early carcinoma stages defined by Lin et al.’ as adenocarci- 
nomas, as the tumours from 2.5 to 4-month-old animals often have mixed char- 
acteristics of both kinds. High-grade carcinoma corresponds to the late 
carcinoma/advanced invasive carcinoma as defined by Lin et al.'’, and is always 
associated with metastatic disease. 

Immunofluorescence staining. Tumours were fixed in 10% neutral buffered 
formalin overnight and embedded in paraffin for sectioning. Sections were cut 
at 5um. Tumour sections were deparaffinized in Histoclear II, and antigen 
retrieval was performed with Nuclear Decloaker (Biocare Medical) using a micro- 
wave. Sections were then blocked with 0.5% normal donkey serum (Jackson 
ImmunoResearch Laboratories) in PBST (PBS + 0.3% Triton X-100) for 1h at 
room temperature. Sections were incubated with primary antibody at 4 °C over- 
night. After three washes with PBS, sections were incubated with secondary anti- 
bodies (Biotium) and DAPI for 2 h at room temperature, washed three times with 
PBS, and mounted in Prolong gold antifade reagent (Invitrogen P36930). For anti- 
Slug, anti-Snail and anti-Zeb1 immunofluorescence, the signals were amplified 
with the TSA Plus Systems (Perkin Elmer) following manufacturer’s instructions. 

Tumour organoids were fixed in collagen I gel with 4% paraformaldehyde for 
1h at room temperature, blocked with 0.5% normal donkey serum in PBST, and 
then incubated with primary antibodies at 4 °C overnight. After five washes with 
PBST, organoids were incubated with DAPI, secondary antibodies and Phalloidin 
at 4°C overnight. After five washes with PBST, the collagen I gel containing the 
organoids were mounted in Prolong gold antifade reagent. 

Cultured tumour cells were fixed in 4% paraformaldehyde and blocked with 
0.1% normal donkey serum in PBST for half an hour at room temperature. Cells 
were incubated in specific primary antibodies for 1-2 h, washed three times with 
PBS, then incubated with secondary antibodies for 1 h at room temperature. After 
three washes with PBS, stained cells were mounted in Prolong gold antifade 
reagent. 

Immunostained samples were imaged using Zeiss LSM710 and Zeiss LSM700 
confocal microscopes and analysed with Zen software. 

Antibodies used in this study are listed in the Supplementary Information. 
Tumour dissociation, FACS fractionation and derivation of tumour cell lines. 
Tumours were taken from the animals aseptically. At least one fragment from each 
tumour was saved for histological staging of the tumour. The remainder of each 
tumour was then minced with a razor blade, and the minced chunks were then 
rinsed three times with PBS, and digested with collagenase A followed by trypLE- 
select (Invitrogen). The dissociated tumour cells were then washed twice with 
DMEM with 10% FBS, and filtered through a 70 ,1m and 401m cell strainer. 
The resulting cells were stained with DAPI, anti-EpCAM antibody, washed three 
times with PBS, and resuspended in PBS for flow cytometry analysis and FACS 
fractionation. 

To establish tumour cell lines, 1 X 107 dissociated and filtered tumour cells were 
plated in a 10 cm dish in DMEM/F12 supplemented with 5% adult bovine serum, 
non-essential amino acids (Invitrogen), and penicillin/streptomycin. On the next 
day, dead cells were removed by medium change, and the attached cells were 
passaged at 1:2 to 1:3 for about five passages until each culture was established. 
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Tumour cell lines were all derived from high-grade carcinomas of 6-7 months 
old females. 
Cell lines and cell culture. PyMT tumour cell lines were cultured in DMEM/F12 
(1:1) supplemented with 5% adult bovine serum (Sigma B9433), non-essential 
amino acid (Invitrogen 11140), and Pen/Strep (Invitrogen 15070). MDA-MB- 
231, MDA-MB-361 and MDA-MB-468 cells were culture in DMEM/F12 (1:1) 
supplemented with 10% inactivated fetal bovine serum (Sigma F4135) and Pen/ 
Strep. SKBR3, BT20 and MCF7Ras cells were cultured in DMEM supplemented 
with 10% inactivated fetal bovine serum and Pen/Strep. SUM149 and SUM159 
cells were cultured in F12 media supplemented with 5% inactivated fetal bovine 
serum, insulin (5 1g ml‘), and hydrocortisone (1 ug ml '). T47D, HS578T and 
BT549 cells were cultured in RPMI supplemented with 10% inactivated fetal 
bovine serum and Pen/Strep. MCF10A cells were cultured in DMEM/F12 supple- 
mented with 5% horse serum (Sigma H1207), insulin (10 pg ml” * Sigma 19278), 
EGF (100 pg ml~’ Sigma E9644), hydrocortizone (0.5 mg ml~* Sigma H0888), 
and cholera toxin (100 ng ml * Sigma C8052). All human breast cancer cell lines 
were obtained from ATCC, and are free of mycoplasma contamination. 
MMITV-PyMT tumour organoids culture was performed as previously 
described’*. 3D organoid culture of primary MECs was performed as previously 
described*. Tumour-sphere assays were performed as described in Tam et al.*. 200 
cells (1,000 cells for MDA-MB-361 cells) were plated into each well of 96-well 
ultra-low attachment plate to score tumour-sphere formation efficiency. Five wells 
were analysed for each condition. 
Tumour cell implantation. For cleared mammary fat pad transplantation, prim- 
ary MECs were isolated from the mammary glands of 4- to 5-week-old virgin 
female mice by collagenase and trypsin digestion. Sorted EpCAM” cells were 
suspended in 10 kl PBS containing 50% Matrigel, and injected into the inguinal 
mammary fat pads of 3-week-old NOD/SCID mice, whose endogenous mammary 
epithelium were cleared at the time of the injection. For tail-vein injection, tumour 
cells were resuspended in 100 pil PBS, and injected into male animals. The lungs 
were examined 4-8 weeks post injection. For subcutaneous injections, tumour 
cells were suspended in 100 pl PBS with or without Matrigel (as indicated in the 
figures) into the flanks of male animals. The tumour incidence and weight were 
measured 2-3 months post injection. For orthotopic tumour transplantations, 
tumour cells were resuspended in 20 pl media with the indicated amount of 
Matrigel and injected into mammary fat pads. Host animals were randomized 
by weight. In Fig. 4c, 10° PyMT tumour cells were injected unilaterally into the 
mammary fat pad (without matrigel) to score primary tumour burden and meta- 
static dissemination (six animals for each group). In Fig. 4d, 10° MDA-MB-231 
cells were injected bilaterally (with 20% matrigel) to score primary tumour burden 
and metastatic dissemination (five animals for each group). For the metastasis 
quantification, the lungs were examined under a Leica fluorescence dissecting 
microscope. Blinded quantifications were performed when scoring the numbers 
of lung metastases. For limiting dilution analyses, the frequency of TICs in the cell 
population being transplanted was calculated using the Extreme Limiting Dilution 
Analysis Program (http://bioinf.wehi.edu.au/software/elda/index.html)*. 
Cleared mammary fat pad injection. Cleared mammary fat pad injections were 
performed as previously described®. Briefly, 1 X 10° cells were suspended in 10 pl 
PBS containing 50% Matrigel and injected into the inguinal mammary fat pads of 
NOD/SCID female mice that had previously been cleared of endogenous mam- 
mary epithelium. Gland reconstitutions were assessed under fluorescence dissect- 
ing microscope at 3 months post injection. 
RNA isolation, reverse transcription and qPCR analysis. Total RNA from 
freshly sorted primary tumour cells was extracted using Trizol (Invitrogen), and 
column-purified with PicoPure RNA Isolation Kit (Applied Biosystems). cDNA 
synthesis was performed with 0.2-2 tg of total RNA using SuperScript III First- 
Strand Synthesis System (Invitrogen). mRNA levels were measured with gene- 
specific primers using the Roche LightCycler 480 system (Roche). Relative 
expression levels were normalized to f-actin. Primers used for qPCR analysis 
are listed in Supplementary Information. 
shRNA vectors. The sources or targeting sequences of the shRNA used in the 
study are as follows: mouse shSlug, shSlug4 from Guo et al.*; mouse shSnail, 
from Shibue et al.**; human shSLUG, clones 1 and 2 from ref. 34; human 
shSNAIL, no. 1 TTCCTTGTTGCAGTATTIG, no. 2 ATAAATACCAGTGTA 
CCTT; shLuciferase, CCTAAGGTTAAGTCGCCCTCG. 
Meta-analysis of oncogenomic data. To test whether the expression of Slug or 
Snail correlated with distant metastasis-free survival (all subtypes, n = 1,610) and 
relapse-free survival (ER* patients, n = 1,802; PRt patients, n = 525; ER PR™ 
patients, n = 346) the data sets GSE1456, GSE2034, GSE2990, GSE3494, GSE4922, 
GSE6532, GSE7390, GSE11121, GSE12093, GSE5327, GSE9195, GSE16391, 
GSE12276, GSE2603, GSE17705, GSE21653, GSE16446, GSE17907, GSE19615, 
GSE20685, GSE20711, GSE26971, GSE31448, GSE31519, E-MTAB-365, 
GSE20194 and GSE20271 were analysed and Kaplan-Meier plots were generated 
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using http://kmplot.com (ref. 24). The patient samples were grouped as either high 
or low expressor for the genes of interest based on the expression level of the 
selected gene, and the upper tertile were used as the cut-off and median is com- 
puted over the entire data set. 

Chromatin immunoprecipitation sequencing. ChIP assay was carried out as 
described previously’. The ChIP-seq results have been deposited to Gene 
Expression Omnibus (GEO) under accession number GSE61198. We performed 
ChIP-seq experiments using either an anti-Slug or anti-Snail antibody. In the Slug- 
high pBI.1G cells, as anticipated, we could only recover chromatin fragments from 
Slug ChIP but not Snail ChIP. Conversely, from the Snail™® pBL3G cells, chro- 
matin fragments were only recoverable from Snail ChIP but not Slug ChIP, dem- 
onstrating the specificities of these antibodies. Slug- or Snail-bound DNA sites 
were determined with Model-based Analysis of ChIP-Seq (MACS) algorithm, and 
bound target genes were defined as containing Slug or Snail occupancy within 5 kb 
upstream and downstream relative to the transcription start site for each RefSeq 
transcript®*. Fold enrichment of each MACS peak was calculated against the 
whole-cell extract. Enriched motifs were identified using 1,000 nucleotides centred 
at the peak summit of the top 1,000 Slug and Snail peaks (ranked by MACS peak 
scores). The sequences were processed through MemeChIP (http://meme.nbcr.- 
net/meme/cgi-bin/meme-chip.cgi) using default settings. 

Gene set enrichment analyses (GSEA) were performed with the GSEA platform 
of the Broad Institute (http://www.broadinstitute.org/gsea/index.jsp). Slug-bound 
genes and Snail-bound genes were ranked according to the fold enrichment of the 
correspondent MACS peaks. 


For SLUG and SNAIL ChIP-qPCRin MDA-MB-231 cells, ChIP-enriched DNA 
was analysed by real-time PCR using the ABI PRISM 7900 sequence detection 
system and SYBR green master mix. Relative occupancy values were calculated by 
determining the apparent immunoprecipitation efficiency (ratios of the amount of 
immunoprecipitated DNA to that of the input sample) and normalized to the level 
observed. The primers used for the real-time PCR are ZEB1 target locus forward 
ACAAGCGAGAGGATCATGGCGG, reverse CACTCACCGTTATTGCGCCG; 
ZEB1 control locus forward TAATAATGGGCGGCAACGGC, reverse AGGAA 
CCAAAGCGAGCCCCT. 

Statistical analysis. Statistical analyses were carried out by two-tailed Student’s 
t-test unless otherwise specified. No statistical methods were used to predetermine 
sample size. 
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Extended Data Figure 1 | Slug expression is associated with a partial EMT 
phenotype in normal MECs. a, Validation of the Slug-YFP knock-in reporter. 
Mammary tumour sections from Slug’””’*;MMTV-PyMT female mice 

were stained for YFP (green), Slug (red), cytokeratin (grey) and DAPI (blue). 
b, Validation of the Snail-YFP knock-in reporter. Mammary tumour sections 
from Snail’*"’* ;MMTV-PyM T female mice were stained for YFP (green), Snail 
(red), cytokeratin (grey) and DAPI (blue). ¢, Lin” cells of normal mammary 
glands were separated into luminal MECs, basal MECs and stromal fibroblasts 
using CD24 and CD49f cell-surface markers. d, e, Representative FACS 
histogram showing relative expression levels of Slug—YFP and Snail-YFP 
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reporters in the indicated cell populations in mammary glands during puberty 
(d) and during pregnancy (e). Note that luminal MECs from pregnant females 
exhibit higher levels of autofluorescence signals (grey dashed line in panel 

e). f, Normal human mammary tissue sections were stained for Slug or Zeb1 
(green), CK14 (red), CK8 (grey), and DAPI (blue). Arrowheads indicate 
Slug" CK14" cells. g, Representative FACS histogram showing expression level 
of the epithelial cell-surface marker EpCAM in the indicated populations of 
the normal mammary gland. Panels d, e, g, are representative of three 
independent experiments. All scale bars indicate 10 jum. 
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Extended Data Figure 3 | Snail activation is associated with invasive 
changes in mammary tumour cells ex vivo. a, Freshly isolated tumour 
organoids stained for YFP (green), CK14 (red), CK8 (grey) and DAPI (blue). 
Note that only background staining was detected for YFP and CK14. Scale bar, 
20 um. b, Tumour organoids from animals of the indicated genotypes were 
cultured in type I collagen gel for 48h and stained for YFP (green), phalloidin 


EMT activation and progression in primary 
mammary tumours - driven by Snail 


(red) and DAPI (blue). Scale bar, 10 jum. c, Frequency of CK8*CK14* leader 
cells expressing Slug—YFP and Snail-YFP (n, number of cells). Tumour 
organoids from five different animals were analysed for each genotype. 

d, Schematic diagram summarizing expression patterns of Snail and Slug in the 
normal mammary gland and at different stages of mammary tumour 
development in the MMTV-PyMT model. 
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Extended Data Figure 4 | Differential expression of Snail and Slug in stained for the indicated proteins. Five tumours were analysed, and 
MMTV-Neu and BRCA-1/p53-minus models of mammary tumours. quantifications are shown in f (n, number of cells). Scale bar, 10 um. g, H&E 
a-c, Representative immunofluorescence images of sections of aggressive staining showing representative histology of differentiated area in 
MMTV-Neu tumours stained for DAPI (blue), cytokeratin (red) and Slug MM TV-cre;p53\’ ;BRCAV “# tumours. Scale bar, 50 tum. h, Representative 
(green, a), Snail (green, b) or Zeb1 (green, c). Scale bar, 10 tm. d, H&E staining immunofluorescence images of the invasive areas in MM TV-cre;p53°/ “2 
showing representative histology of differentiated area in MMTV-cre;p53°’; | BRCAL™" tumours stained for the indicated proteins. Five tumours were 
BRCAL tumours. Scale bar, 50 uum. e, Representative immunofluorescence analysed, and quantifications are shown in i (n, number of cells). 
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Extended Data Figure 5 | Differential expression of Snail and Slug in 
human breast cancer cell lines. a, Representative immunofluorescence images 
of indicated human breast cancer cell lines stained for DAPI (blue), SNAIL 
(green) and SLUG (red). Scale bar, 10 um. b, Quantification of SLUG versus 
SNAIL expression in indicated human breast cancer cell lines (n, number of 
cells). Five fields were counted for each cell line. c, Representative image 


showing the morphologies of the series of MCF10A cell lines in culture. Scale 
bar, 50 um. d, Western blot showing expression of SLUG and SNAIL in the 
indicated MCF10A cell lines. Panels a-d represent two independent 
experiments. Uncropped western blots are available in Supplementary 
Information. 
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Extended Data Figure 6 | Isolation of tumour cell subpopulations with 
differential Snail and Slug expression by FACS. a, b, Representative whole- 
mount images showing tumour progression in the transplantation model of 
mammary tumours illustrated in Fig. 3a. The implanted cells initially formed 
rudimentary gland-like structures (a) and eventually progressed to become 
high-grade carcinomas that spontaneously metastasize to the lungs. The RFP 
marker allows detection of pulmonary metastases as shown in b. Scale bars, 
500 um. Images represent five independent experiments. c, d, FACS profiles of 
REP* tumour cells in the pulmonary metastases corresponding to the primary 
tumours shown in Fig. 3b, c. Major populations are outlined with dashed 


circles. e, Snail””’* ;MM TV-PyMT tumour cells were separated into indicated 
populations by FACS. The morphologies of the unfractionated cells and the 
purified populations are shown. Scale bar, 50 tm. f, Western blots 

showing expression of EMT markers in the indicated cell populations. 

g Slug’*”’* ;MM TV-PyMT tumour cells were separated into indicated 
populations by FACS. The morphologies of the unfractionated cells and the 
purified populations are shown. Scale bar, 50 tm. h, Western blots showing 
expression of Slug, YFP and Snail in the indicated cell populations. Uncropped 
western blots are available in Supplementary Information. e-h, Data 
represent three independent experiments. 
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a, Experimental scheme for Fig. 3f, g and Extended Data Fig. 8a-g.b,c, Tumour _ frequencies were evaluated by ELDA. b, c, Tumour initiation was scored and 
cell subpopulations from Snail”"”’*;MMTV-PyMT tumour cell line (b) and presented as (number of tumour incidences/number of injections). 
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Extended Data Figure 8 | Breast TICs express Snail. a, H&E staining 
showing the histology of the donor primary tumour where cells used in Fig. 3f 
were isolated from. Scale bar, 200 um. b, The original pulmonary metastases 
spawned by the primary tumour (left panel), and pulmonary metastases 
formed by the indicated tumour cell populations following tail-vein injection. 
Scale bar, 500 jum. c, Higher magnification images of H&E-stained lung 
sections showing histology of the original pulmonary metastases in the 
donor animal (left panel), and pulmonary metastases formed by the 
Slug-YFP'°’EpCAM"’ tumour cells following tail-vein injection. Scale bar, 
200 pum. d, Representative immunofluorescence staining image of sections of 
pulmonary metastases formed by the Slug~YFP’°’EpCAM”™’ tumour cells 
were stained for DAPI (blue), Slug (green), CK14 (red) and CK8 (grey). 
Arrowheads indicate Slug-positive cells. Scale bar, 20 um. Images represent 


EpCam'° 


four independent experiments. e, H&E staining of the donor primary tumour 
where cells used in Fig. 3g were isolated from (left panel) and H&E staining 
of primary tumours formed by the indicated populations following 
subcutaneous implantation (with 25% Matrigel). Scale bar, 200 um. f, Primary 
tumour burdens formed by the indicated populations after subcutaneous 
implantation (for Ep>CAM°“Slug™ cells 1 X 10° cells were injected, for the 
other two groups 1 X 10° cells were injected). Primary tumours and lungs were 
analysed 10 weeks post injection (n = 10 sites of injection for each group). 
Open circle indicates failure of tumour initiation. Source Data is associated with 
this figure. g, H&E staining of lung sections showing metastatic outgrowths 
spawned by the indicated cell populations following subcutaneous 
implantation. Scale bar, 500 um. 
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Extended Data Figure 9 | Snail and Slug are differentially employed by 
normal MaSCs and breast TICs. a, Kaplan-Meier plots showing survival of 
patients with the indicated subtypes of breast cancers. Patient groups were 
separated based on SLUG (top row) or SNAIL (bottom row) mRNA expression. 
b, Western blot confirming Slug and Snail knockdown in established PyYMT 
tumour cell line transduced with the indicated shRNA expression vectors. The 
shLuciferase (shLuc) shRNA was used as a control. c, Western blot confirming 
SLUG and SNAIL knockdown in MDA-MB-231 cells transduced with the 
indicated shRNA expression vectors. shLuc was used as a control. Uncropped 
western blots are available in Supplementary Information. d, Tumour-sphere 
formation efficiencies (no. tumour spheres/1,000 cells for MDA-MB-361 cells, 
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and no. tumour spheres/200 cells for all the other cell lines) of the indicated 
human breast cancer cells transduced with shSLUG#2, shSNAIL#2 and the 
shLuc control (mean + s.d., 1 = 5 technical replicates per group). Data 
represent two independent experiments. e, f, SUM159 (e) and SUM149 (f) cells 
transduced with the indicated shRNAs were injected subcutaneously at 
limiting dilutions to score primary tumour formation. Tumour initiation was 
scored and presented as (no. of tumour incidences/no. of injections). Data 
represent two independent experiments. g, The organoid forming efficiencies 
of normal MECs transduced with the indicated shRNA expression vectors 
(mean + s.d., n = 6 technical replicates per group, *P < 0.001, NS, not 
significant.). Scale bar, 100 jum. Data represent three independent experiments. 
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Extended Data Figure 10 | Slug and Snail occupy different genomic regions. __ tertiles. d, Sample top motifs enriched around the summits of the anti-Snail and 
a, Western blots showing expression of EMT-TFs and EMT markers in the anti-Slug ChIP peaks. e, Sample ChIP-seq signals for Slug and Snail are shown. 
PyMT tumour cell lines used for the ChIP-seq analyses. Uncropped western —_ Left column shows promoters bound by Slug only. Right column shows 

blots are available in Supplementary Information. Data represent three promoters bound by Snail only. Arrows indicate the directions of transcription. 
independent experiments. b, Pulmonary metastases formed by 100,000 cells of | f, MCF10A human mammary epithelial cells were transduced with rtTA 

the indicated cell lines following tail-vein injection (n = 9 animals per group). and SNAIL driven by a tet-on promoter, untreated (left panel) or treated 
Source Data is associated with this figure. c, Box plot showing distributions with 2 jg ml~* doxycycline (dox) for 48h (right panel), and stained for 

of fold enrichment of all peaks identified in Snail ChIP and Slug ChIP. E-cadherin (green) and ZEB1 (red). Scale bar, 20 1m. Data represent five 
Horizontal bar indicates the median and whiskers indicate the top and bottom —_ independent experiments. 
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A spatial model predicts that dispersal and cell 
turnover limit intratumour heterogeneity 


Bartlomiej Waclaw!, Ivana Bozic”, Meredith E. Pittman‘, Ralph H. Hruban’, Bert Vogelstein*” & Martin A. Nowak??® 


Most cancers in humans are large, measuring centimetres in 
diameter, and composed of many billions of cells’. An equivalent 
mass of normal cells would be highly heterogeneous as a result of 
the mutations that occur during each cell division. What is remark- 
able about cancers is that virtually every neoplastic cell within a 
large tumour often contains the same core set of genetic altera- 
tions, with heterogeneity confined to mutations that emerge late 
during tumour growth’ °. How such alterations expand within the 
spatially constrained three-dimensional architecture of a tumour, 
and come to dominate a large, pre-existing lesion, has been 
unclear. Here we describe a model for tumour evolution that shows 
how short-range dispersal and cell turnover can account for rapid 
cell mixing inside the tumour. We show that even a small selective 
advantage of a single cell within a large tumour allows the 
descendants of that cell to replace the precursor mass in a clinically 
relevant time frame. We also demonstrate that the same mechan- 
isms can be responsible for the rapid onset of resistance to chemo- 
therapy. Our model not only provides insights into spatial and 
temporal aspects of tumour growth, but also suggests that target- 
ing short-range cellular migratory activity could have marked 
effects on tumour growth rates. 

Tumour growth is initiated when a single cell acquires genetic or 
epigenetic alterations that change the net growth rate of the cell (birth 
minus death), and enable its progeny to outgrow surrounding cells. As 
these small lesions grow, the cells acquire additional alterations that 
cause them to multiply even faster and to change their metabolism to 
survive better the harsh conditions and nutrient deprivation. This 
progression eventually leads to a malignant tumour that can invade 
surrounding tissues and spread to other organs. Typical solid tumours 
contain about 30-70 clonal amino-acid-changing mutations that have 
accumulated during this multi-stage progression’. Most of these muta- 
tions are believed to be passengers that do not affect growth, and only 
~5-10% are drivers that provide cells with a small selective growth 
advantage. Nevertheless, a major fraction of the mutations, particu- 
larly the drivers, are present in 30-100% of neoplastic cells in the 
primary tumour, as well as in metastatic lesions derived from it??. 

Most attempts at explaining the genetic make-up of tumours 
assume well-mixed populations of cells and do not incorporate spatial 
constraints*’°. Several models of the genetic evolution of expanding 
tumours have been developed in the past’’"*, but they assume either 
very few mutations'’'? or one- or two-dimensional growth'*”*. 
Conversely, models that incorporate spatial limitations have been 
developed to help to understand processes such as tumour metabolism", 
angiogenesis'®’” and cell migration’’, but these models ignore gen- 
etics. Here, we formulate a model that combines spatial growth and 
genetic evolution, and use the model to describe the growth of primary 
tumours and metastases, as well as the development of resistance to 
therapeutic agents. 


We first model the expansion of a metastatic lesion derived from a 
cancer cell that has escaped its primary site (for example, breast or 
colorectal epithelium) and travelled through the circulation until it 
lodged at a distant site (for example, lung or liver). The cell initiating 
the metastatic lesion is assumed to have all the driver gene mutations 
needed to expand. Motivated by histopathological images (Fig. 1a), we 
model the lesion as a conglomerate of ‘balls’ of cells (see Methods and 
Extended Data Fig. 1). Cells occupy sites in a regular three-dimen- 
sional lattice (Extended Data Fig. 2a, b). Cells replicate stochastically 
with rates proportional to the number of surrounding empty sites 


Figure 1 | Structure of solid neoplasms. a, Hepatocellular carcinoma 
composed of balls of cells (circled in green) separated by non-neoplastic tissue 
(asterisk). b, Adjacent section of the bottom tumour in a immunolabelled 
with the proliferation marker Ki67. The edge of the tumour is delineated in red; 
the centre is marked with a green circle. Proliferation is decreased in the centre 
when compared to the edge of the neoplasm. c, d, Higher magnification of 
the centre (c) and the edge (d) with each proliferating neoplastic cell marked 
by a green dot. The blue nuclei without green dots are non-proliferating. The 
red circle in c demonstrates an example of cells (inflammatory cells) that 
were not included in the count of neoplastic cells. The neoplastic tissue in 

d is above the red line; non-neoplastic (normal liver) is below the red 

line. Comparison of c with d shows that proliferation of neoplastic cells is 
decreased in the centre as compared to the edge of the lesion (quantified in 
Extended Data Table 1). 
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(non-neoplastic cells or extracellular matrix), hence replication is 
faster at the edge of the tumour. This is supported by experimental 
data (Fig. 1b-d and Extended Data Table 1). A cell with no cancer cell 
neighbours replicates at the maximal rate of b = In(2) = 0.69 days ', 
in which b denotes the initial birth rate, equivalent to 24h cell- 
doubling time, and a cell that is completely surrounded by other cancer 
cells does not replicate. Cells can also mutate, but we assume all muta- 
tions are passengers (they do not confer fitness advantages). After 
replication, a cell moves with a small probability (M) to a nearby place 
close to the surface of the lesion and creates a new lesion. This ‘sprout- 
ing’ of initial lesions could be due to short-range migration after an 
epithelial-to-mesenchymal transition'* and consecutive reversion to a 
non-motile phenotype. Alternatively, it could be the result of another 
process such as angiogenesis (Methods), through which the tumour 
gains better access to nutrients. The same model governs the evolution 
of larger metastatic lesions that have already developed extensive vas- 
culature. Cells die with a death rate (d) independent of the number of 
neighbours, and are replaced by empty sites (non-neoplastic cells 
within the local tumour environment). 

If there is little dispersal (M ~ 0), the shape of the tumour becomes 
roughly spherical as it grows to a large size (Fig. 2a and Supplementary 
Video 2). However, even a very small amount of dispersal markedly 
affects the predicted shape. For M > 0, the tumour forms a conglom- 
erate of ‘balls’ (Fig. 2b, Extented Data Fig. 2c and Supplementary 
Video 3), much like those observed in actual metastatic lesions, with 
the balls separated by islands of non-neoplastic stromal cells mixed 
with extracellular matrix. In addition to this remarkable change in 
topology, dispersal strongly affects the growth rate and doubling time 
of the tumour. Although the size (N) of the tumour increases with time 
(T) from initiation as ~ T° without dispersal (Extended Data Fig. 3a, b), 
it grows much faster (~exp(const X T) for large T) when M>0 
(Fig. 2c). This also remains true for long-range dispersal in which M 
affects the probability of escape from the primary tumour into the 
circulation to create new lesions in distant organs (metastasis). 
Using plausible estimates for the rates of cell birth, death and dispersal 
probability, we calculate that it takes 8 years for a lesion to grow from 
one cell to one billion cells in the absence of dispersal (M = 0), but less 
than 2 years with dispersal (Fig. 2c). The latter estimate is consistent 
with experimentally determined rates of metastasis growth as well 
as clinical experience, while the conventional model (without dis- 
persal) is not. 
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Non-spatial models point to the size of a tumour as a crucial deter- 
minant of chemotherapeutic drug resistance’’*’. To determine 
whether a spatial model would similarly predict this dependency in 
a clinically relevant time frame, we calculated tumour regrowth prob- 
abilities after targeted therapies. We assume that the cell that initiates 
the lesion is susceptible to treatment, otherwise the treatment would 
have no effect on the mass, and that the probability of a resistant 
mutation is 10” (Methods); only one such mutation is needed for a 
regrowth. 

Figure 3a shows snapshots from a simulation (Supplementary 
Video 1) performed before and after the administration of a typical 
targeted therapy at time T = 0. At first, the size of the lesion (~3 mm at 
T = 0) rapidly decreases, but 1 month later resistant clones begin to 
proliferate and form tumours of microscopic size. Such resistant sub- 
clones are predicted to be nearly always present in lesions of sizes that 
can be visualized by clinical imaging techniques”’”*. By 6 months after 
treatment, the lesions have regrown to their original size. The evolu- 
tion of resistance is a stochastic process—some lesions shrink to zero 
and some regrow (Extended Data Fig. 4a). Figure 3b, c shows the 
probability of regrowth versus the time from the initiation of the lesion 
to the onset of treatment upon varying net growth rates b-d and 
dispersal probabilities. Regardless of growth rate, the capacity to 
migrate makes it more likely that regrowth will occur sooner, particu- 
larly for more aggressive cancers, that is, those which have higher net 
growth rates (Fig. 3b). This conclusion is in line with recent theoretical 
work on evolving populations of migrating cells”. If resistant muta- 
tions additionally increase the dispersal probability before or during 
treatment, regrowth is faster (Extended Data Fig. 4b, c). 

Having shown that the predictions of the spatial model are consist- 
ent with metastatic lesion growth and regrowth times, we turn to 
primary tumours. In contrast to metastatic lesions, here the situation 
is considerably more complex because the tumour cells are continually 
acquiring new driver gene mutations that can endow them with fitness 
advantages over adjacent cells within the same tumour. Our model of a 
primary tumour assumes that it is initiated via a single driver gene 
mutation that provides a selective growth advantage over normal 
neighbouring cells. Each subsequent driver gene mutation reduces 
the death rate as d= b(1 —s)*, in which k is the number of driver 
mutations in the cell (k= 1), and s is the average fitness advantage 
per driver. Almost identical results are obtained if driver gene muta- 
tions increase cell birth rather than decrease cell death, or affect both 


Figure 2 | Short-range dispersal affects size, 
shape and growth rate of tumours. a, b, A 
spherical lesion in the absence of dispersal (M = 0) 
(a) and a conglomerate of lesions (b), each initiated 
by a cell that has migrated from a previous 

lesion, for low but non-zero migration (M = 10°). 
Colours reflect the degree of genetic similarity; 
cells with similar colours have similar genetic 
alterations. The death rate is d = 0.8b, 

which corresponds to a net growth rate of 

0.2b = 0.14 days ', and N= 10’ cells. c, Dispersal 
(M > 0) causes the tumour to grow faster in 

time. Each point = 100 samples, error bars (too 
small to be visible) are s.e.m. Continuous lines 
(extrapolation) are 6,000 X 10°" (green), 

1,000 X 10°77 (blue). 
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Figure 3 | Treatment success rates depend on the net growth rate of 
tumours. a, Time snapshots before and during therapy (M = 10 °). Resistant 
subpopulations that cause the tumour to regrow after treatment can be seen 
at T= 1 month. b, c, Probability of tumour regrowth (Pregrowtn) as a function 
of time after treatment initiation, for different dispersal probabilities (M) and 
net growth rates of the resistant cells. A higher net growth rate (b) leads to 

a high regrowth probability, so that 50% of tumours regrow 6 months after 
treatment is initiated when M = 10 °. c, Tumours with lower net growth rates 
require >20 months to achieve the same probability of regrowth. Number of 
samples = 1 to 800 per point (282 on average). Error bars are s.e.m. See 
Methods for details. 


cell birth and cell death (Extended Data Fig. 5b); the most important 
parameter is the fitness gain, s, conferred by each driver mutation. 

Figure 4a shows that in the absence of any new driver mutations (as 
for a perfectly normal cell growing in utero), clonal subpopulations 
would be restricted to small, localized areas. Each of these areas has at 
least one new genetic alteration, but none of them confers a fitness 
advantage (they are ‘passengers’). In an early tumour, in which the 
centre cell contains the initiating driver gene mutation, the same struc- 
ture would be observed—as long as no new driver gene mutations have 
yet appeared. The occurrence of a new driver gene mutation, however, 
markedly alters the spatial distribution of cells. In particular, the het- 
erogeneity observed in normal cells (Fig. 4a) is substantially reduced 
(Fig. 4b and Supplementary Video 5). The degree of heterogeneity can 
be quantified by calculating the number of genetic alterations (passen- 
gers plus drivers) shared between two cells separated by various dis- 
tances (Fig. 4d-f). The genetic diversity is markedly decreased (Fig. 4e), 
even with relatively small fitness advantages (s = 1%). This also has 
implications for the number of genetic alterations that will be present 
in a macroscopic fraction (for example, >50%) of all cells. Figure 4f 
shows that this number is many times larger for s = 1% than s = 0%. 
Furthermore, our model predicts that virtually all cells within a large 
tumour will have at least one new driver gene mutation after 5 years of 
growth (Extended Data Fig. 5a). The faster the clonal expansion occurs 
(the larger s is), the smaller the number of passenger mutations 
(Extended Data Fig. 5d, e). Our results are also robust to changes to 
the model (Methods and Extended Data Figs 5 and 6). We stress that 
an important prerequisite for limiting heterogeneity is cell turnover in 
the tumour, because in the spatial setting cells with driver mutations 
can ‘percolate’ through the tumour only if they replace other cells. In 
the absence of cell turnover, tumours are much more heterogeneous 
(Extended Data Fig. 6d). 

In summary, our model accounts for many facts observed clinically 
and experimentally. Our results are robust and many assumptions can 
be relaxed without qualitatively affecting the outcome (Methods and 
Supplementary Information). Although tumour cell migration has 
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Figure 4 | Genetic diversity is strongly reduced by the emergence of driver 
mutations. a-f, For all, M = 0 and the initial net growth rate = 0.007 days” ' 
(d = 0.99b). The three most abundant genetic alterations (GAs) have been 
colour-coded using red (R), green (G) and blue (B) (c). Each section is 80 cells 
thick. Combinations of the three basic colours correspond to cells having two or 
three of these genetic alterations. a, No drivers—separated, conical sectors 
emerge in different parts of the lesion, each corresponding to a different clone. 
b, Drivers with selective advantage s = 1% lead to clonal expansions and 
many cells have all three genetic alterations (white area). d, Genetic diversity 
can be determined quantitatively by randomly sampling pairs of cells separated 
by distance r and counting the number of shared genetic alterations. e, The 
number of shared genetic alterations versus the normalized distance r/<r> 
decreases much more slowly for the case with (red) than without (blue) driver 
mutations. f, The total number of genetic alterations present in at least 

50% of all cells is much larger for s = 1% than for s = 0%. Number of 
samples = 50 per data point. Error bars are s.e.m. 


historically been viewed as a feature of cancer associated with late 
events in tumorigenesis, such as invasion through basement mem- 
branes or vascular walls, this classical view of migration pertains to 
the ability of cancer cells to migrate over large distances. Instead, our 
analysis reveals that even small amounts of localized cellular move- 
ment are able to markedly reshape a tumour. Moreover, we predict 
that the rate of tumour growth can be substantially altered by a change 
in dispersal rate of the cancer cells, even in the absence of any changes 
in doubling times or net growth rates of the cells within the tumour. 
Some of our predictions could be experimentally tested using new cell 
labelling techniques****. Our results could also greatly inform the 
interpretation of mutations in genes whose main functions seem to 
be related to the cytoskeleton or to cell adhesion rather than to cell 
birth, death, or differentiation’””*. For example, cells that have lost the 
expression of E-cadherin (a cell adhesion protein) are more migratory 
than normal cells with intact E-cadherin expression”, and loss of 
E-cadherin in pancreatic cancer has been associated with poorer pro- 
gnosis*®, in line with our predictions. 


Online Content Methods, along with any additional Extended Data display items 
and Source Data, are available in the online version of the paper; references unique 
to these sections appear only in the online paper. 
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METHODS 


No statistical methods were used to predetermine sample size. Experiments were 
not randomized and investigators were not blinded to allocation during experi- 
ments and outcome assessment. 

Spatial model for tumour evolution. Tumour modelling has a long tradition’’. 
Many models of spatially expanding tumours were proposed in the past’? "*"°?", 
but they either assume very few?**°7*?4!37 or no new mutations at all!*!°88-, 
or one- or two-dimensional growth'*!*?*??°°°?, On the other hand, well-mixed 
models with several mutations®*°*** do not often include space, and computa- 
tional models aimed at being more biologically realistic!*°*°' require too much 
computing resources (time and memory) to simulate realistically large tumours 
(N ~ 10° cells). Our model builds on the Eden lattice model® and combines spatial 
growth and accumulation of multiple mutations. Since we focus on the interplay of 
genetics, spatial expansion and short-range dispersal of cells, for simplicity we do 
not explicitly model metabolism”, tissue mechanics, spatial heterogeneity of tis- 
sues, different types of cells present or angiogenesis’®. 

A tumour is made of non-overlapping balls (microlesions) of cells. Tumour cells 
occupy sites of a regular 3D square lattice (Moore neighbourhood, 26 neighbours). 
Empty lattice sites are assumed to be either normal cells or filled with extracellular 
matrix and are not modelled explicitly. Each cell in the model is described by its 
position and a list of genetic alterations that have occurred since the initial neo- 
plastic cell, and the information about whether a given mutation is a passenger, 
driver, or resistance-carrying mutation. A passenger mutation does not affect the 
net growth rate whereas a driver mutation increases it by disrupting tight regu- 
lation of cellular divisions and shifts the balance towards increased proliferation or 
decreased apoptosis. The changes can also be epigenetic and we do not distinguish 
between different types of alterations. We assume that each genetic alteration 
occurs only once (‘infinite allele model’). The average numbers of all genetic 
alterations, driver and resistant genetic alterations produced in a single replication 
event are denoted by y, yg, and y,, respectively. When a cell replicates, each of the 
daughter cells receives n new genetic alterations of each type (1 being generally 
different in both cells) drawn at random from the Poisson probability distribution: 


e7?=/2(y, /2)" 
P@)=—_ (1) 
in which x denotes the type of genetic alteration. 

In model A shown in Figs 2-4, replication occurs stochastically, with rate 
proportional to the number of empty sites surrounding the replicating cell, and 
death occurs with constant rate depending only on the number of drivers. We also 
simulated other scenarios (models B, C and D, see below). Driver mutations 
increase the net growth rate (the difference between proliferation and death) either 
by increasing the birth rate or decreasing the death rate by a constant factor 1+ s, 
in which s > 0. 

Dispersal is modelled by moving an offspring cell to a nearby position where it 
starts a new microlesion (Extended Data Fig. 1a). Microlesions repel each other; a 
‘shoving’ algorithm®*® (Extended Data Fig. 1b) ensures they do not merge. 
Code availability. The computer code (available at http://www2.ph.ed.ac.uk/ 
~bwaclaw/cancer-code) can handle up to 1 X 10° cells, which corresponds to 
tumours that are clinically meaningful and can be observed by conventional 
medical imaging (diameter >1 cm). The algorithm is discussed in details in the 
Supplementary Information. It is not an exact kinetic Monte Carlo algorithm 
because such an algorithm would be too slow to simulate large tumours. A com- 
parison with kinetic Monte Carlo for smaller tumours (Supplementary 
Information) shows that both algorithms produce consistent results. 

Model parameters. The initial birth rate b = In(2) ~ 0.69 days’, which corre- 
sponds to a 24h minimum doubling time. The initial death rate d= 0...0.995b 
depends on the aggressiveness of the tumour (larger values = less aggressive lesion). 
In simulations of targeted therapy, we assume that, before treatment, b = 0.69 
days ' and d=0.5b=0.35 days ', whereas during treatment b = 0.35 days ' 
and d = 0.69 days ', that is, birth and death rates swap places. This rather arbitrary 
choice leads to the regrowth time of about 6 months, which agrees well with clinical 
evidence. Mutation probabilities are y = 0.02, yg=4 10°, y,=1X 10’, inline 
with experimental evidence and theoretical work***®. Since there are no reliable 
data on the dispersal probability M, we have explored a range of values between 
M=1X10 ’and1 10 *. All parameters are summarized in Extended Data Fig. 
Ic, see also further discussion in Supplementary Information. 

Validity of the assumptions of the model. Our model is deliberately oversim- 
plified. However, many of the assumptions we make can be experimentally jus- 
tified or shown not to qualitatively affect the model. 

Three-dimensional regular lattice of cells. The 3D Moore neighbourhood was 
chosen because it is computationally fast and introduces relatively fewer artefacts 
related to lattice symmetries. Real tissues are much less regular and the number of 
nearest neighbours is different®’. However, recent simulations of similar models of 
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bacterial colonies””! show that the structure of the lattice (or the lack thereof in 
off-lattice models) has a marginal effect on genetic heterogeneity. 
Asynchronous cell division. Division times of related cells remain correlated for a 
few generations. However, stochastic cell division implemented in our model is a 
good approximation for a large mass of cells and is much less computationally 
expensive than modelling a full cell cycle. 

Replication faster at the boundary than in the interior. Several studies have 
described a higher proliferation rate at the leading edge of tumours, and this has 
been associated with a more aggressive clinical course”. To estimate the range of 
values of death rate d for our model, we used the proliferation marker Ki67. 
Representative formalin-fixed, paraffin-embedded tissue blocks were selected 
from four small chromophobe renal cell carcinomas and six small hepatocellular 
carcinomas by the pathologist (M.E.P.). A section of each block was immunola- 
belled for Ki67 using the Ventana Benchmark XT system. Around 8-12 images, 
depending on the size of the lesion, were acquired from each tumour. Fields were 
chosen at random from the leading edge and the middle of the tumour and were 
not necessarily ‘hot spots’ of proliferative activity. Using an ImageJ macro, each 
Ki67-positive tumour nucleus was labelled green by the pathologist, and each 
Ki67-negative tumour nucleus was labelled red. Other cell types (endothelium, 
fibroblasts and inflammatory cells) were not labelled. The proliferation rate was 
then calculated using previously described methods”. Statistical significance of the 
results was determined using a Kolmogorov-Smirnov two-sample test (signifi- 
cance level 0.05). The study was approved by the Institutional Review Board of the 
Johns Hopkins University School of Medicine. In all ten tumours, the proliferation 
rate at the leading edge of the tumour was greater than that at the centre by a factor 
of 1.25 to 6 (Extended Data Table 1). Comparing the density of proliferating cells 
to our model gives d ~ 0.5b (range: d = 0.17b...0.8b), which is what we assume in 
the simulations of aggressive lesions. 

Equal fitness of all cells in metastatic lesions. We assume that cells in a meta- 
static lesion are already very fit since they contain multiple drivers. Indeed, studies 
of primary tumours and their matched metastases usually fail to find driver muta- 
tions present in the metastases that were not present in the primary lesions””*, 
although there are notable exceptions, see, for example, refs 75 and 76. 
Experimental evidence in microbes” and (to a lesser extent) in eukaryotes” sug- 
gests that fitness gains due to individual mutations are largest at the beginning of 
an evolutionary process and that the effects of later mutations are much smaller. It 
remains to be seen how well these results apply to late genetic alterations in 
cancer” but if true, new drivers occurring in the lesion are unlikely to spread 
through the population before the lesion reaches a clinically relevant size. 
Dispersal. In our model, cells detach from the lesion and attach again at a different 
location in the tissue. This can be viewed either as cells migrating from one place to 
another one, or as a more generic mechanism that allows tumour cells to get better 
access to nutrients by dispersing within the tissue, hence providing a growth 
advantage over cells that did not disperse. Some mechanisms that do not involve 
active motion (that is, cells becoming motile) are discussed below. 

Migration. Cancer cells are known to undergo epithelial-to-mesenchymal trans- 
ition, the origin of which is thought to be epigenetic’®. This involves a cell becom- 
ing motile and moving some distance. If the cell finds the right environment, it can 
switch back to the non-motile phenotype and start a new lesion. Motility can be 
enhanced by tissue fluidization due to replication and death*. Instead of mod- 
elling the entire cycle (epithelial-mesenchymal-epithelial), we only model the 
final outcome (a cell has moved some distance). 

Tumour buds. Many tumours exhibit focally invasive cell clusters, also known as 
tumour buds. Their proliferation rate is less than that of cells in the main tumour™. 
We propose that tumour buds contain cells that have not yet completed epithelial- 
to-mesenchymal transition and therefore they proliferate slower. 

Single versus cluster migration. Ref. 82 found that circulating cancer cells can 
travel in clusters of 2-50 cells, and that such clusters can initiate metastatic foci. 
They report that approximately one-half of the metastatic foci they examined were 
initiated by single circulating cancer cells, and that circulating cancer cell clusters 
initiated the other half. The authors also note that the cells forming a cluster are 
probably neighbouring cancer cells from the primary tumour. This means that the 
genetic make-up of cells within a newly established lesion will be very similar, 
regardless of its origin (single cell versus a small cluster of cells). Therefore, the 
ability to travel in clusters should not affect the genetic heterogeneity or regrowth 
probability as compared to single-cell dispersal from our model. 

Angiogenesis. We do not explicitly model angiogenesis for two reasons. First, 
most genetic alterations that can either change the growth rate or be detected 
experimentally must occur at early stages of tumour growth as explained before. 
Hence, the genetic make-up of the tumour is determined primarily by what 
happens before angiogenesis. Second, local dispersal from the model mimics 
tumour cells interspersing with the vascularized tissue and getting better access 
to nutrients, which is one of the outcomes of angiogenesis. 
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Biomechanics of tumours. Growth is affected by the mechanical properties of 
cells and the extracellular matrix. We do not explicitly include biomechanics (see, 
however, below), in contrast to more realistic models**™*, as this would not allow us 
to simulate lesions larger than about 1 X 10° cells. Instead, we take experimentally 
determined values for birth and death rates, values that are affected by biomecha- 
nics, as the parameters of our model. 

Isolated balls of cells. In our simulations, balls of cells are thought to be separated 
by normal, vascularized tissue which delivers nutrients to the tumour. The envir- 
onment of each ball is the same, and there are no interactions between the balls 
other than mechanical repulsion. This represents a convenient mathematical con- 
trivance and qualitatively recapitulates what is observed in stained sections of 
actual tumours (Fig. 1a). We investigated under which circumstances the balls 
of cancer cells would mechanically repel each other; see Extended Data Fig. 7 for a 
graphical summary of the results. We simulated a biomechanical, off-lattice model 
of normal tissue composed of ‘ducts’ lined with epithelial cells and separated by 
stroma (Supplementary Information, section 8). Mechanical interactions between 
cells were modelled using an approach similar to that described previously’, 
with model parameters taken from refs 59, 60, 85-88. We assumed cancer cells to 
be of epithelial origin, as are most cancers’. Cancer cells that invaded different 
areas of epithelium grew into balls that remained separated by thin slices of stroma 
(Supplementary Videos 8-11). This ‘encapsulation’ of tumour microlesions was 
possible owing to the supportive nature of stroma that is able to mechanically resist 
expansion of balls of cancer cells. Encapsulation is essential if the balls are to repel 
each other. If the tissue is ‘fluidized’ by random replication and death, the balls 
quickly merge (Supplementary Video 12). Another important factor are differ- 
ences in mechanical properties of tumour and normal cells”; it is known that 
differences in cellular adhesion and stiffness promote segregation of different types 
of cells?!”?. 

In reality, microlesions within the primary tumour are less symmetric and some 
of them are better described as ‘protrusions’ bulging out from the main tumour 
tissue, owing to biomechanical instabilities; see, for example, refs 93, 94. However, 
stroma may still provide enough spatial separation, and the capillary network of 
blood vessels—either due to tumour angiogenesis or preexisting in the invaded 
tissue—may provide enough nutrients to the lesions so that our assumption of 
independently growing balls of cells remains valid. Therefore, we believe that 
modelling the tumour as a collection of non- or weakly-interacting microlesions 
is essentially correct. We also note that the existence of isolated balls is not neces- 
sary to explain our qualitative results: reduced heterogeneity and increased growth 
in the presence of migration. Supplementary Video 13 shows that even if the tissue 
is homogeneous and highly dynamical and there are no isolated balls of cells, 
migration leads to a considerable speedup of growth as compared to the case with 
no migration (Supplementary Video 14). 

Tumour geometry and heterogeneity in the absence of driver mutations. 
Supplementary Videos 2 and 3 illustrate the process of growth of a tumour with 
maximally N = 10’ cells, for M=0 and M= 10%, respectively, and for d= 0.5. 
Extended Data Fig. 2 shows snapshots from a single simulation for M = 0, 
N~ 10°, and d=0 (no death, Extended Data Fig. 2a) and d= 0.9 (Extended 
Data Fig. 2b). In the latter case, cells are separated by empty sites (normal cells/ 
extracellular matrix). Extended Data Fig. 2c shows that the tumour is almost 
spherically symmetric for M = 0. The symmetry is lost for small but non-zero 
M, and restored for larger M when the balls become smaller and their number 
increases. Extended Data Fig. 2c also shows that metastatic tumours contain many 
clonal sectors with passenger mutations. Extended Data Fig. 8a shows that the 
fraction G(r) of genetic alterations that are the same in two randomly sampled cells 
(Fig. 4) separated by distance r quickly decreases with r, indicating increased 
genetic heterogeneity owing to passenger mutations. 

Targeted therapy of metastatic lesions. Models of cancer treatmen often 
assume either no spatial structure or do not model the emergence of resistance. We 
assume that the cell that initiated the lesion was sensitive to treatment but its 
progeny may become resistant. Before the therapy commences, all cells have the 
same birth and death rates, but after the treatment resistant cells continue to 
proliferate with the same rate, whereas susceptible cells are assigned different rates 
as described above. Resistant cells can emerge before and during the therapy. The 
death rate of sensitive cells during treatment is greater than the birth rate, or the 
tumours would not be sensitive to the drug. For example, in Fig. 3 treatment 
increases the death rate and decreases the growth rate of susceptible cells, the 
growth rate of resistant cells after therapy is identical to that of the sensitive cells 
before treatment, d = 0.5) in the absence of treatment, M = 10°, and treatment 
begins when the tumour has N = 10’ cells. 

Note that our model assumes the drug is uniformly distributed in the tumour”; 
it is known that drug gradients can speed up the onset of resistance’. 

Supplementary Video 1 and Extended Data Fig. 4a show that, since the 
process of resistance acquisition is stochastic, some tumours regrow after an initial 
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regression, and some do not. If only resistant cells can migrate, regrowth is faster 
(Extended Data Fig. 4b, c). Extended Data Fig. 4d-g shows regrowth probabilities 
Pregrowth for different treatment scenarios not mentioned in the main text, depend- 
ing on whether the drug is cytostatic (Dtreatment = 0) or cytocidal (dtreatment = B), 
and whether d=0 or d>0 before treatment. In Extended Data Fig. 4d, cells 
replicate and die only on the surface, and the core is ‘quiescent’—cells are still 
alive there but cannot replicate unless outer layers are removed by treatment 
(Supplementary Videos 6 and 7). Pregrowth does not depend on the dispersal prob- 
ability M at all, and is close to 100% for N> 10° cells, a size that is larger than for 
d> 0 (Extended Data Fig. 4f). It can be shown that Pregrowth = 1 — exp(—yrN). 
Extended Data Fig. 4e is for the cytostatic drug (Dtreatment = treatment = 0); this is 
also equivalent to the cytocidal drug if the tumour has a necrotic core (cells are dead 
but still occupy physical volume). In this case, Pregrowth increases with M because 
more resistant cells are on the surface for larger M (cells can replicate only on the 
surface in this scenario). Extended Data Fig. 4f, g shows models with cell death 
present even in the absence of treatment (d= 0.9b) but occurring only at the 
surface, unlike in Fig. 3 where cells also die inside the tumour. Death increases 
Pregrowth OWing to a larger number of cellular division necessary to obtain the same 
size, and hence more opportunities to mutate. 

Relaxing the assumptions of the model. Figure 4 shows that even a small fitness 
advantage substantially reduces genetic diversity through the process of clonal 
expansion, see also Supplementary Videos 4 and 5. We now demonstrate that this 
also applies to modified versions of the model, proving its robustness. 

Exact values of Mand shas no qualitative effect. Extended Data Fig. 8b, e shows 
that the average number of shared genetic alterations is larger in the presence of 
drivers also in the case of non-zero dispersal (M > 0), and its numerical value is 
almost the same as for M = 0 (Fig. 4). Extended Data Fig. 8c, f shows that as long as 
s >and regardless of its exact value, driver mutations reduce genetic diversity in 
the tumour compared to the case s = 0. Extended Data Fig. 5a-c shows how many 
driver mutations are expected to be present in a randomly chosen cell from a 
tumour that is T years old. Neither dispersal nor the way drivers affect growth (via 
birth or death rate) has a significant effect on the number of drivers per cell 
(Extended Data Fig. 5b, c). A small discrepancy visible in Extended Data Fig. 5b 
is caused by a slightly asymmetric way death and birth is treated in our model, see 
the Supplementary Information. 

Model B. Cells replicate with constant rate if there is at least one empty neighbour. 
In the absence of drivers, genetic alterations are distributed evenly throughout the 
lesion (Extended Data Fig. 6b) but they often occur independently and the number 
of frequent genetic alterations is low (Extended Data Fig. 6e). Drivers cause clonal 
expansion as in model A. 

Model C. Cells replicate regardless of whether there are empty sites surrounding 
them or not. When a cell replicates, it pushes away other cells towards the surface 
(Supplementary Information). Extended Data Fig. 6c, e shows that this again leads 
to clonal expansion which decreases diversity. 

Model D. Replication/death occurs only on the surface and the core of the tumour 
is static. Extended Data Fig. 6d shows that driver mutations cannot spread to the 
other side of the lesion and conical clonal sectors can be seen even for s > 0. The 
number of frequent genetic alterations is the same for s = 0 ands = 1%, indicating 
that genetic heterogeneity is not lowered by clonal expansion. This demonstrates 
that cell turnover inside the tumour is very important for reducing heterogeneity. 
To obtain the same (low) heterogeneity as for models a-c, the probability of driver 
mutations must be much larger in model D (Extended Data Fig. 6f). 

Drivers affecting M. We investigated three scenarios in which drivers affect 
(1) only the dispersal probability M— (1 + q)M, in which q > 0 is the ‘migration 
fitness advantage’ (no change in b, d), (2) both M and d, that is, (d,M)—> 
(d(1 — s),(1 + q)M) with s, q>0, (3) either M or d, with probability 1/2. 
Extended Data Fig. 3c shows that growth is unaffected in cases (1, 3) compared 
to the neutral case. For (2) the tumour growth rate increases significantly when the 
tumour is larger than N = 1 X 10° cells. This shows that migration increases the 
overall fitness advantage, in line with ref. 102, which shows that fixation probabil- 
ity is determined by the product of the exponential growth rate and diffusion 
constant (motility) of organisms. 

Six-site (von Neumann) neighbourhood. We simulated a model in which each 
cell has only six neighbours (von Neumann neighbourhood) instead of 26 (Moore 
neighbourhood). Extended Data Fig. 9 compares models A and C for the two 
neighbourhoods and show that there is only a small quantitative difference in the 
growth curves for model A (model C is unaffected), but that the shape of the ball of 
cells deviates more from the spherical one for the six-site neighbourhood, see also 
section 7 in the Supplementary Information. 
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a Xj = Xi + Rixi/|xil 
b overlap overlap 
reduction reduction 
c 
Parameter Meaning Value References 
b Birth rate In 2 = 0.69 days” This work 
d Death rate 0... 0.995-b This work 
s Selective advantage 0...0.1 (=O... 10%) This work 
Y Mutation probability (all GAs) 0.02 (’) [8,66,67,68] 
Vg Mutation probability (drivers only) 4-105 (*) [8,66,67,68] 
iA Mutation probability (resistance- 107 (") [8,66,67,68] 
carrying mutations) 
M Dispersal probability 0... 10+ This work 


Extended Data Figure 1 | Details of the model. a, A sketch showing how 
dispersal is implemented: (1) A ball of cells of radius R;, in which the centre 
is at X;, is composed of tumour cells and normal cells (blue and empty squares 
in the zoomed-in rectangle (2)). A cell at position x; with respect to the centre 
of the ball attempts to replicate (3). If replication is successful, the cell 
migrates with probability M and creates a new microlesion (4). The position X; 
of this new ball of cells is determined as the endpoint of the vector that starts 
at X; and has direction x; and length R;. b, Overlap reduction between the 
balls of cells. When a growing ball begins to overlap with another ball (red), they 


are both moved apart along the line connecting their centres of mass (green 
line) by as much as necessary to reduce the overlap to zero. The process is 
repeated for all overlapping balls as many times as needed until there is no 
overlap. c, Summary of all parameters used in the model. If, for a given 
parameter, many different values have been used in different plots, a range of 
values used is shown. Birth and death rates can also depend on the number 
of driver mutations, see Methods. Asterisk, parameter estimated from other 
quantities available in the literature. 
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Extended Data Figure 2 | Simulation snapshots. a, b, A few snapshots of 
tumour growth for no dispersal, and d = 0 (a) and d = 0.9b (b). To visualize 
clonal sectors, cells have been colour-coded by making the colour a 
heritable trait and changing each of its RGB components by a small random 
fraction whenever a cell mutates. The initial cell is grey. Empty space (white) 


OFé 


M=10% M=10% 


are non-cancer cells mixed with extracellular matrix. Note that images are not 
to scale. c, Tumour shapes for N= 1 X 10’, d= 0.9b, and different dispersal 
probability M. Images not to scale; the tumour for M = 1 X 10° ° is larger 
than the one for M = 0. 
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Extended Data Figure 3 | Tumour size as a function of time. a, Growth of 
a tumour without dispersal (M = 0), for d = 0.8. For large times (T), the 
number of cells grows approximately as const X T°. The tumour reaches size 
N=1X 10” cells (horizontal line) after about 100 months (8 years) of growth. 
b, The same data are plotted in the linear scale, with N replaced by ‘linear 
extension’ N“”?. c, Tumour size versus time when drivers affect the dispersal 
probability. In all cases, d = 0.9b, and (1, black) drivers increase the dispersal 
rate tenfold (q = 9) but have no effect on the net growth rate; (2, red) 
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drivers increase both the net growth rate (s = 10%) and M; (3, green) drivers 
either (with probability 1/2) increase M tenfold (q = 9) or increase the net 
growth rate by s = 10%; (4, blue) drivers increase only the net growth rate by 
s = 10%; and (5, black dashed line) neutral case with M = 1 X 10 ”, which is 
indistinguishable from (1). In all cases (1-3) the initial dispersal probability 


M=1X 10 ’. Points represent average value over 40-100 simulations per data 
point, error bars are s.e.m. 
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Extended Data Figure 4 | Simulation of targeted therapy. a-c, The total 
number of cells in the tumour (black) and the number or resistant cells (red) 
versus time, during growth (T < 0) and treatment (T'> 0), for ~100 
independent simulations, for d = 0.5b for T <0. Therapy begins when 
N=1%X 10° cells. After treatment, many tumours die out (N decreases to zero) 
but those with resistant cells will regrow sooner or later. a, M = 0 for all cells at 
all times. b, M = 0 for all cells for T<0 and M = 10 * for resistant cells for 
T>0.c, M = 0fornon-resistant and M = 10 ° for resistant cells at all times. In 
all three cases, Pregrowth is very similar: 36 + 5% (mean + s.e.m.) (a), 25 + 4% 
(b), and 27 + 6% for (c). d~g, Regrowth probability for four treatment scenarios 
not discussed in the main text. Data points correspond to three dispersal 
probabilities: M = 0 (red), M=1X 10° (green), and M=1 X 10° * (blue). 
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The probability is plotted as a function of tumour size N just before the therapy 
commences. d, Before treatment, cells replicate only on the surface. Cells in 
the core are quiescent and do not replicate. Therapy kills cells on the surface and 
cells in the core resume proliferation when liberated by treatment. e, As in 

d, but drug is cytostatic and does not kill cells but inhibits their growth. The 
results for Pregrowth are identical if the drug is cytotoxic and the tumour has a 
necrotic core (cells die inside the tumour and cannot replicate even if the 
surface is removed). f, Before treatment, cells replicate and die on the surface. 
The core is quiescent. Therapy kills cells on the surface (cytotoxic drug). 

g, As in f, but therapy only inhibits growth (cytostatic drug). In all cases 
(d-g) error bars represent s.e.m. from 8-1,000 simulations per point. 
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Extended Data Figure 5 | Accumulation of driver and passenger genetic 
alterations. a—c, The number of drivers per cell in the primary tumour plotted 
as a function of time (10-100 simulations per point, error bars denote s.e.m.). 


they affect death or birth rate. c, Dispersal does not affect the rate of driver 
accumulation. d, e, The number of passenger mutations (PMs) per cell versus 
the number of driver mutations per cell. More passenger mutations are present 
a, M = 0 and three different driver selective advantages. For s = 1%, cells for smaller driver selective advantage (d), and this is independent of the 
accumulate on average one driver mutation within 5 years. The time can be dispersal probability M (e) in the regime of small M. Data points correspond to 
significantly lower for very strong drivers (s > 1%). b, The rate at which drivers independent simulations. 

accumulate depends mainly on their selective advantage and not on whether 
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Extended Data Figure 6 | Genetic diversity in a single lesion for different 
models. a-d, Representative simulation snapshots, with genetic alterations 
colour-coded as in Fig. 4. Top: s = 0, bottom: s = 1%. a, Model A from the main 
text in which cells replicate with rates proportional to the number of empty 
nearby sites. b, Model B, the replication rate is constant and non-zero if there is 
at least one empty site nearby, and zero otherwise. c, Model C, cells replicate at a 
constant rate and push away other cells to make space for their progeny. 

d, Model D, cells replicate/die only on the surface, the interior of the tumour 
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c d 


Constant replication rate 


Surface growth/death 


(‘necrotic core’) is static. In all cases, N= 1 X 10’, d= 0.99b. e, Number of 
genetic alterations present in at least 50% of cells for identical parameters as 
in a-d. In all cases except surface growth (d), drivers increase genetic 
homogeneity, as measured by the number of most frequent genetic alterations. 
Results averaged over 50-100 simulations, error bars denote s.e.m. f, Model D, 
with yg = 2 X 10 * instead of 4 X 10°, that is, drivers occur five times 

more often. In this case, driver mutations arise earlier than in d, and the 
tumour becomes more homogeneous. 
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a 
eae 5 
Parameter Meaning Value References 6d 
L Length of the simulation box 400-800 um This work 
d Cell diameter 10 um (’) [59,60,86] 
g Elongation speed 0.208 [ym/h] (tf) = This work 
7 Dynamic viscosity of the intracellular fluid 2-105 Pa-s [86] 
Eo. Young modulus of epithelial/cancer cells 1 kPa (’) [86] 
E, Young modulus of stroma 1 kPa (#) [87, 88] 
Vv Poisson’s ratio 1/3 (‘) [59,60,86] 
o Cell-cell adhesion energy 200 pJ/m? (") [86] 


Extended Data Figure 7 | The off-lattice model. a, Summary of all 
parameters used in the model. Asterisk, typical value, varies between different 
types of tissues; dagger symbol, equivalent to 24h minimal doubling time; 
double dagger symbol, based on the assumption that macroscopic elastic 
properties of tissues such as liver, pancreases or mammary glands are primarily 
determined by the elastic properties of stroma. b, Simulation snapshot of a 
normal tissue before the invasion of cancer cells. c, Two balls of cancer cells in 


wee 


two nearby ducts repel each other as they grow as a consequence of mechanical 
forces exerted on each other. d, The balls coalesce if growth is able to break 
the separating extracellular matrix. e, If the balls are not encapsulated, 

they quickly merge. f, Isolated balls of cells are not required to speed up growth; 
migration (left) can cause the tumour to expand much faster even if 
individual microlesions merge together, as opposed to the case with no 
migration (right). 
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Extended Data Figure 8 | Genetic diversity quantified. a, Tumours are much 
more genetically heterogeneous in the absence of driver mutations (s = 0) (see 
Fig. 4). The plot shows the fraction G(r) of genetic alterations (GAs) shared 
between the cells as function of their separation (distance r) in the tumour. The 
fraction quickly decreases with increasing r. The distance in the figure is 
normalized by the average distance <r> between any two cells in the tumour. 
For a spherical tumour, <r> is approximately equal to half of the tumour 
diameter. b, Fraction of shared genetic alterations for s = 1% and s = 0%, 


2.0 


N=1X10’,andM=1%X10 ’. Inthe presence of drivers, G(r) decays slower, 
indicating more homogeneous tumours. c, The exact value of the selective 
advantage of driver mutations is not important (all curves G(r) look the same, 
except for s = 0) as long as s > 0. d-f, Number of genetic alterations present 
in at least 50% of cells for identical parameters as in a—c, correspondingly. 
Drivers substantially increase the level of genetic homogeneity. In all panels 
the results have been averaged over 30-100 simulations, with error bars 

as S.e.m. 
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Extended Data Figure 9 | Growth curves for the 26-nearest neighbours 
(26n, red curves) and the 6-nearest neighbours (6n, green curves) models. 
a, Model A (as in the main text), no death. The tumour grows about twice as 
slow in the 6n model. Pictures show tumour snapshots for both models; 
there is no visible difference in the shape. b, Model A, death d = 0.8b. The 
additional blue curve is for the 6n model, with modified replication probability 
to account for missing neighbours as explained in the Supplementary 
Information. c, Model A, with death d = 0.95b, and drivers s = 5%. There is 


10 15. 20 


T [days] 


25 


very little difference in the growth curves between the 6n and 26n models. A 
small asymmetry in the shape is caused by faster-growing cells with driver 
mutations. d, Model C (exponential growth). Growth is the same in both 6n and 
26n models, but the shape is more aspheric for the 6n model. This is 
probably caused by shifting cells along the coordinate axes and not along the 
shortest path to the surface when making space for new cells. All plots show 
the mean (average over 50-100 simulations) and s.e.m. 
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Extended Data Table 1 | Experimental results for the percentage of proliferating cells in the centre versus the edge of solid tumours 


Edge Center Ratio 
center:edge p-value 


cells cells 
1 Chromophobe 8 3.46 3 : 0.05 
RCC 
2 Chromophobe ; i ; 0.28 
RCC 
3 Chromophobe : ‘ : 0.03 
RCC 
4 Chromophobe : : i 0.17 
RCC 
5 HCC 0.05 
6 HCC 0.03 
7 HCC 0.09 
8 HCC 0.35 
9 HCC 0.33 
10 HCC 0.22 
1-4 Chromophobe : : 5 0.00002 
RCC 
5-10 | HCC ; ; : 0.007 


A representative section of each tumour was labelled for the proliferation marker Ki67 (KI), and images of the tumour at the leading edge and the centre were acquired as described (Methods). Proliferation is 
markedly increased at the leading edge, and this is statistically significant (‘Summary’, Kol mogorov-Smirnov two-sample test, P< 0.05). The average ratio of the number of proliferating cells in the centre/at the 
edge is 0.50 (range 0.17-0.79). HCC, hepatocellular carcinoma; RCC, renal cell carcinoma 
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Allosteric receptor activation by the plant peptide 


hormone phytosulfokine 


Jizong Wang'*, Hongju Li**, Zhifu Han', Heqiao Zhang’, Tong Wang”, Guangzhong Lin’, Junbiao Chang”, Weicai Yang” & Jijie Chai’ 


Phytosulfokine (PSK) is a disulfated pentapeptide that has a ubi- 
quitous role in plant growth and development’”. PSK is perceived 
by its receptor PSKR*, a leucine-rich repeat receptor kinase (LRR- 
RK). The mechanisms underlying the recognition of PSK, the 
activation of PSKR and the identity of the components down- 
stream of the initial binding remain elusive. Here we report the 
crystal structures of the extracellular LRR domain of PSKR in free, 
PSK- and co-receptor-bound forms. The structures reveal that PSK 
interacts mainly with a B-strand from the island domain of PSKR, 
forming an anti-B-sheet. The two sulfate moieties of PSK interact 
directly with PSKR, sensitizing PSKR recognition of PSK. 
Supported by biochemical, structural and genetic evidence, PSK 
binding enhances PSKR heterodimerization with the somatic 
embryogenesis receptor-like kinases (SERKs). However, PSK is 
not directly involved in PSKR-SERK interaction but stabilizes 
PSKR island domain for recruitment of a SERK. Our data reveal 
the structural basis for PSKR recognition of PSK and allosteric 
activation of PSKR by PSK, opening up new avenues for the design 
of PSKR-specific small molecules. 

Peptide signalling has critical roles in regulating plant physiology’. 
Phytosulfokine (PSK)° is a secreted disulfated pentapeptide (Tyr 
(SO3H)-Ie-Tyr(SO3H)-Thr-Gln) that has ubiquitous roles in plant 
growth and development’. PSK matures through proteolytic cleavage 
of its precursor proteins® with post-translational sulfation’ for its full 
activity®*°. PSK receptor was first identified in Daucus carota (carrot)’ 
and the corresponding gene DcPSKR is conserved among plants 
including Arabidopsis that encodes two PSKR orthologues, PSKR1 
(ref. 4) and PSKR2 (ref. 10), but PSK perception largely relies on 
PSKRI (ref. 10). 

DcPSKR and PSKRI1/2 (PSKRs) belong to the large family of leu- 
cine-rich repeat receptor kinases (LRR-RKs) with an extracellular LRR 
domain and a cytoplasmic kinase domain (KD)"’. The extracellular 
domains of the three LRR-RKs contain 21 LRRs with an island domain 
(ID) required for PSK perception*”®. PSK binding induces signalling 
mediated by Ca**/CaM binding and the kinase activity of PSKRI 
(ref. 12), suggesting that ligand binding activates the PSKR1™”, as 
observed in the well-studied RKs such as flagellin insensitive 2 
(FLS2) and brassinosteroid insensitive 1 (BRI1)’*. Signalling mediated 
by the latter two receptor kinases requires ligand-enhanced hetero- 
dimerization with the LRR-RK BAK] (ref. 14), a member of somatic 
embryogenesis receptor-like kinases (SERKs) that generally act as a co- 
receptor with other LRR-RKs"». 

We first solved the crystal structures of the PSK-PSKRI1'*® (Fig, la 
and Extended Data Table 1) and PSK-DcPSKR!®® (Extended Data 
Fig. la and Extended Data Table 1) complexes. PSK adopts a B-strand 
conformation, forming an anti-parallel B-sheet with the PSKRU? 
(Fig. 1a). Besides the hydrogen bonds within the B-sheet (Fig. 1b), 
PSKRIS**”, PSKRIS?, PSKRI™°S and PSKR1“?**° from the 
inner side of the helical structure also form hydrogen bonds with the 


main chain of PSK (Fig. 1c). Additionally, PSKRI“"°”’ and PSKRIA*° 
form hydrogen bonds with the free carboxyl group of PSK“ 
(Fig. 1c), whereas PSKR1?"*°°° tightly packs against PSK“ and 
PSK'*. The two sulfate moieties contribute to PSK-PSKRI'** 
interactions via both hydrogen bonds involving PSKR1'*°°* and 
PSKR1“*"** and van der Waals packing involving PSKRI'*”?, 
PSKR1"'P“8 and PSKR1'*8 (Fig. 1b, c). The PSK-interacting resi- 
dues of PSKRI are highly conserved in DcPSKR (Extended Data 
Fig. 1b, c) and PSKR2 (Extended Data Fig. 1d), suggesting that the 
three PSKRs are conserved in PSK recognition. Indeed, the structure 
of PSK-DcPSKR"®® is almost identical to that of PSK-PSKRI'** 
(Fig. 1d) with a r.m.s.d. (root mean square deviation) of 1.45 A. 
Further supporting the sulfate group-mediated PSK-DcPSKR'** 
interactions, microscale thermophoresis (MST) showed that PSK dis- 
played a higher binding affinity with DcPSKR'™* than the desulfated 
PSK (dPSK) (Extended Data Fig. 2a), agreeing with the observation 
that dPSK promotes root elongation of Arabidopsis plants but 
with a lower activity than PSK®. Previous studies using microsomal 
fractions derived from cells showed that PSK-PSKR interaction dis- 
played a dissociation constant of 4.2nM in carrot’ and 7.7nM in 
Arabidopsis*, approximately 200-370 times stronger than the affinity 
measured between DcPSKR'™* and PSK by MST. The precise reason 
for the affinity difference between cell-based and in vitro quantifica- 
tion assays is unclear, but it is possible that interactions between 
transmembrane or cytoplasmic domains within the cellular context 
provide an environment more favourable for PSK interaction with its 
receptor. Assays using MST also confirmed the important role of the 
critical DcPSKR'™* residues (Extended Data Fig. 1b, c) in PSK recog- 
nition, as their mutations compromised PSK-DcPSKR"® asso- 
ciation, albeit to varying degrees (Extended Data Fig. 2b). 

As observed previously’, the pskr1-3 Arabidopsis mutants displayed 
a shortened root phenotype (Fig. le). The phenotype was fully com- 
plemented by wild-type (WT) PSKRI and the chimaeric PSKRI1 car- 
rying DcPSKR'™* and the transmembrane domain and KD of PSKR1, 
and almost fully complemented by DcPSKR (Fig. le), but not by the 
PSKRI constructs carrying mutations of the residues critical for PSK- 
PSKRI1 interaction (Fig. 1b, c, e). Furthermore, plants carrying the 
single PSKR1 mutants were less responsive to PSK than the WT plants 
(Extended Data Fig. 2c). 

PSK binding induced no oligomerization of PSKR1'"* or DcPSKR'** 
(Extended Data Fig. 3), suggesting that a co-receptor is required for 
their activation based on the dimerization model’*. PSKR1/2 and 
DcPSKR belong to the same family of LRR-RKs as BRI1 (ref. 11) that 
utilizes a SERK member as its co-receptor’’. Moreover, PSK promotes 
somatic embryogenesis’®, a marker of which is DcSERK”’. These data 
prompted us to hypothesize that a SERK member functions as a co- 
receptor with PSKRs. Indeed, gel filtration showed that PSK induced 
the formation of a complex between PSKR1'** and SERK1/2/BAK1'** 
(Fig. 2a and Extended Data Fig. 4a, b). PSKRITR®_SERKIRR (Fig. 2a) 
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Figure 1 | Recognition mechanism of PSK by PSKR1"™™*, a, Overall structure 
of PSK-PSKRI1"** complex. Arrow indicates the position of PSK. ID, island 
domain. b, Detailed interactions of PSK (purple) with the ID (salmon) of 
PSKRIR®. sY, sulfated tyrosine. c, Detailed interactions of PSK with the 
inner surface (cyan) of PSKR1"®®  d, Structural comparison of PSK- 
PSKRI'RRG!-63)) and PSK-DcPSKR'RR??-)_ @, Reducing PSK-PSKRI'** 
interaction compromises PSKR1 to complement the shortened root phenotype 
of pskr1-3 mutants. Average (+s.e.m.) primary root lengths of seedlings 
were determined in three independent experiments. Three independent 
overexpression lines (represented by -1, -2 and -3) per genotype were analysed 
(n = 30 for each line, *P < 0.05, ***P < 0.001, Student’s t-test). 


or DcPSKR'®®_SERK1/2"®® (Extended Data Fig. 4c, d) was heterodi- 
meric in solution as indicated by gel filtration. Further supporting the 
gel filtration data, sedimentation-velocity analytical ultracentrifugation 
showed that PSKR1'*® formed a PSK-induced heterodimer with 
SERK1/2'"* or BAK1'®® (Fig, 2b and Extended Data Fig. 5). 
Co-expression of full length Flag-conjugated PSKR1 (PSKR1-Flag) 
with haemagglutinin (HA)-conjugated SERKs resulted in rupture 
of Arabidopsis protoplasts quickly. We therefore used a KD truncated 
PSKR1 (PSKR1(AKD))-Flag and SERK1/SERK2/BAK1-HA for co- 
expression in protoplasts. Co-immunoprecipitation (Co-IP) assays 
showed that PSKRI(AKD) interacted with SERK1, SERK2 or 
BAK1 in protoplasts even in the absence of PSK (Fig. 2c), probably 
resulting from the endogenous PSK or their constitutive interaction, as 
observed for the BRII-BAK1 interaction’*. Importantly, the 
PSKR1(AKD)-SERK interactions were substantially increased in the 
PSK-treated protoplasts (Fig. 2c). Similar results were also obtained 
in Arabidopsis co-expressing PSKR1 and BAK1, SERK1 or SERK2 
(Fig. 2d). Further supporting these results, the triple serk1/+;serk2/—; 
bak1/— mutant plants (where serk1 is heterozygote serk2 and bak1 are 
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Figure 2 | PSK promotes PSKR-SERK heterodimerization. a, PSK induces 
PSKR1'**_SERK1'®® heterodimerization. Left, superposition of the gel 
filtration chromatograms of the PSKR1’"* and SERK1'** proteins. The red 
and black arrows indicate the elution positions of PSK-PSKR1™**®-SERK1'** 
and molecular weight markers, respectively. mAU, micro-ultraviolet 
absorbance at 280 nm. Right, Coomassie blue staining of the peak fractions 
shown on the left following SDS-PAGE. M, molecular weight ladder (kDa). 
b, PSK induces a monomeric PSK-PSKR1'**®_SERK1"®* complex in 
sedimentation-velocity analytical ultracentrifugation. The peak sedimentation 
coefficients and the calculated molecular weights for the proteins indicated 
are shown. c, PSK promotes PSKR1-SERK interaction in Arabidopsis 
protoplasts. Flag-tagged PSKRI(AKD) and HA-tagged SERK1/2/BAK1 were 
co-expressed in WT Arabidopsis protoplasts, and their interactions were 
detected by co-immunoprecipitation (Co-IP). Each assay was repeated three 
times. Full blots are shown in Supplementary Data. d, PSK promotes PSKR1- 
SERK interaction in planta. Crude protein extracts from the treated and 
untreated plants overexpressing green fluorescent protein-conjugated PSKR1 
(PSKR1-GFP) and SERK1/2/BAK1-HA were used for Co-IP experiments. 
Each assay was repeated three times. Full blots are shown in Supplementary 
Data. e, The serk1/+;serk2/—;bak1/— triple mutants are less sensitive to 

PSK in root growth. Wild-type or mutant Arabidopsis plants were grown for 
10 days on plates with (+-PSK) or without (CK) 1.0 1M PSK. The image is 
representative of ten plants for each genotype. 


homozygote) had shortened roots much less sensitive to PSK than the 
wild type (WT) plants, phenocopying the pskr1-3 mutants (Fig. 2e). 
Only slightly shorter roots were observed in the single or double 
knockout plants (Extended Data Fig. 6a, b) that were still PSK-sensitive 
(Extended Data Fig. 6c), suggesting functional redundancy of SERKs 
in PSK-induced plant growth. It should be noted that the plant sens- 
itivity to PSK was significantly reduced by inhibition of brassinosteroid- 
induced signalling’ in which BAK1 and other SERK members play 
essential roles’. 

We then solved the crystal structures of the PSK-PSKRI’**- 
SERK1¢88 (Fig. 3a and Extended Data Table 1) and PSK- 
DcPSKR'®*®_SERK2'®® (Extended Data Fig. 7a and Extended Data 
Table 1) complexes. The structures of PSKR1'®® and DcPSKR'®® 
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Figure 3 | PSK stabilizes the PSKR"™ for interaction with SERKs'™*. 

a, Overall structure of the PSK-PSKRI-®®_SERK1®® complex. b, Structural 
comparison of PSK-PSKR1'88@1-8)_SERK1®® and PSK-DcPSKR RRO?-647)_ 
SERK2!8® ¢, Overall structure of the free DcPSKR'®®. d, PSK binding 
stabilizes the DcPSKR™. Shown is the structural alignment of free DcPSKR'**® 
(residues 29-643, grey) and PSK-bound DcPSKR"®® (residues 29-643, cyan) 
with a r.m.s.d. of 0.66 A. 


are homologous (Cx r.m.s.d. 1.49 A over 600 amino acids) and an 
equivalent surface area is buried by their interaction with SERK1'** 
(984 A?) and SERK2!88 (973 A”), respectively (Fig. 3b). SERK1ER8 
binds the carboxy-terminal side of PSKR1'®®, whereas PSKRI1'? con- 
tacts the amino-terminal side of SERK1°®* (Fig. 3a). The structures of 
the two complexes are well aligned with that of the BRI1- but not the 
FLS2-containing complex (Extended Data Fig. 7b, c). Unlike the flg22- 
and brassinosteroid-mediated complexes'*”°”', PSK is not directly 
involved in the PSKR1**-SERK1'®® or DcPSKR'®®-SERK2” 
interfaces (Fig. 3b). This is seemingly inconsistent with the PSK-pro- 
moted PSKR-SERK interaction. The structure of a free DcPSKR'*® 
(Extended Data Table 1) revealed that its ID is completely disordered 
(Fig. 3c and Extended Data Fig. 7d), sharply contrasting with the well- 
defined ID in PSK-bound DcPSKR"*® (Fig. 3d) or PSKR1"®® (Fig, 1a). 
This demonstrates that PSK allosterically induces PSKR'™*-SERK'** 
interaction. 

PSKR1"” interaction with the N-terminal side of SERK1'®® is 
mainly mediated by van der Waals contacts (Fig. 4a). Centred at this 
interface is SERK1'™° that tightly packs against PSKR1'*?!® and 
PSKR1””°!®, Stacking of SERK1?"°*! against PSKR1°"°*”° further for- 
tifies the interactions around this interface (Fig. 4a, left panel). More 
extensive PSKR1’®8_SERK1"®® interactions come from contacts of 
the residues PSKR1?>*°°°, PSKR1S*°°8 and PSKR1™""*? from one 
lateral side of PSKR1 with the inner surface of SERK17®*® (Fig. 4a, right 
panel). The PSKR1*®*-SERK1"®® interactions are highly conserved in 
the PSK-DcPSKR'®™®-SERK2'** complex (Fig. 4b and Extended Data 
Fig. 1d). 

DcPSKR(S608Y) and DcPSKR(T629Y), predicted to generate steric 
clashes with SERK2 (Fig. 4b, right panel), led to loss of PSK-induced 
DcPSKR™*®_SERK2"** interaction (Fig. 4c, left panel and Extended 
Data Fig. 8a, b). A similar observation was also made for 
DcPSKR(F606D). Consistently, mutations of the equivalent residues 
PSKRIPHS5® pgKR1S*°8 and PSKRI! (Fig. 4a) resulted in much 
less responsiveness to PSK for interaction with BAK1 in Arabidopsis 
protoplasts (Fig. 4c, right panel). Furthermore, mutations of these 
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Figure 4 | Mutagenesis analysis of PSKR-SERK interaction. a, Detailed 
interactions of the ID (salmon, left) and the C-terminal side (cyan, right) of 
PSKR1'®® with SERK1'®® (blue). b, Detailed interactions of the ID (salmon, 
left) and the C-terminal side (cyan, right) of DcPSKR'®® with SERK2"®® (blue). 
c, Mutagenesis analysis of PSKR-SERK interaction. Left, interaction between 
WT or mutant DcPSKR"™* and SERK2'** in the presence of PSK as assayed in 
Fig. 2a. Right, mutations of critical PSKR1 residues render PSKR1(AKD)- 
BAK] interaction less sensitive to PSK in Arabidopsis protoplasts as assayed in 
Fig. 2c. Full blots are shown in Supplementary Data. d, Reducing PSKR-SERK 
interaction compromises PSKR1 to complement the shortened root 
phenotypes of pskr1-3 mutants. Average (+s.e.m.) primary root lengths of 
seedlings were determined in three independent experiments for each line 

(n = 30, **P<0.01, Student’s t-test). e, Mutagenesis analysis of PSKR- 
SERK27/BAK1"** interaction. Left, SERK2(T62Y) disrupted PSK-induced 
DcPSKR'""-SERK2""* interaction in solution as assayed in Fig. 2a. Right, 
BAK1(T58Y) is less sensitive to PSK for interaction with PSKR1(AKD) as 
assayed in Fig. 2c. Full blots are shown in Supplementary Data. 


PSKR1 residues but not the controls (PSKR1(S623Y) or 
DcPSKR(S633Y)) (Fig. 4c and Extended Data Fig. 8a, b) reduced 
the ability of PSKR1 to complement the shorter roots of pskr1-3 
mutants and responsiveness of Arabidopsis plants to PSK (Fig. 4d 
and Extended Data Fig. 2c). SERK2(T62Y) is expected to generate 
similar effects on PSK-promoted PSKR-SERK interaction. Indeed, 
the SERK2'®® mutant protein failed to form a PSK-induced complex 
with DcPSKR'®® (Fig. 4e, left panel and Extended Data Fig. 8a, b). 
Consistently, mutation of the equivalent residue BAK1""* (Extended 
Data Fig. 8c) rendered BAK1-PSKR1(AKD) interaction less respons- 
ive to PSK than wild-type BAK] (Fig. 4e, right panel). 

Our current study offers evidence that PSK promotes PSKR-SERK 
heterodimerization, providing a link between PSK perception and 
early intracellular signalling and further supporting the dimerization 
model". Similar to brassinosteroid signalling”, PSK signalling also 
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negatively regulates pathogen-associated molecular pattern (PAMP)- 
triggered immunity (PTI)****. However, expressions of disease-related 
genes were pathogen-induced in the pskr1-3 mutant plants”, similar to 
the bril mutants”, whereas the bak1 or bak1 bkk1 (serk4) mutants 
displayed constitutive immune responses even under sterile growing 
conditions””®. Thus, the roles played by SERK members in plant 
growth and disease resistance seem to be uncoupled, similar to those 
of BAK] in brassinosteroid and PTI signalling’’. These results can be 
reconciled by a previous model”*”* postulating that SERK members 
negatively regulate a brassinosteroid-independent cell-death pathway 
induced by pathogens, which can be antagonized by the PSK signal- 
ling. PSK-enhanced PSKR-SERK heterodimerization can lead to 
transphosphorylation of the two RKs. Indeed, kinase activity of 
PSKR1 is essential for PSK-induced plant growth in Arabidopsis”. 

Unlike flg22 and brassinosteroid, which mediate interactions 
between two LRR-RKs'*”°”! by acting as ‘molecular glue’, PSK func- 
tions to stabilize the PSKR”, which in turn recruits aSERK member to 
form a stable PSKR-SERK complex, resulting in allosteric activation of 
PSKR. The PSKR"” is shorter than that of BRI1, which is well struc- 
tured even in the absence of ligand”*”’. It therefore seems that ligand 
binding is required to complete the PSKR". Indeed, structural com- 
parison showed that the PSKR1'” together with PSK is similarly posi- 
tioned to BRI? (Extended Data Fig. 7b). It will be interesting to 
investigate whether RLPs and some other RKs that contain an ID with 
a similar size and position (relative to the last LRR) to that of PSKR* 
use this mechanism for interaction with their partners. 


Online Content Methods, along with any additional Extended Data display items 
and Source Data, are available in the online version of the paper; references unique 
to these sections appear only in the online paper. 
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METHODS 


No statistical methods were used to predetermine sample size and the experiments 
were not randomized. 

Protein expression and purification. The constructs of DcPSKR'™® (residues 
24-659), PSKRI'®® (residues 24-648), SERKI'®® (residues 1-213, N115D, 
N163Q), SERK2!®® (residues 1-216) and BAKI®® (residues 1-220) with a 
C-terminal 6 X His tag were generated by standard PCR-based cloning strategy 
and their identities were confirmed by sequencing. DcPSKR'®® and PSKRI** 
constructs were expressed in High Five insect cells at 22 °C using the pFastBac-1 
vector (Invitrogen) with a modified N-terminal hemolin signal peptide, and 
SERKIR®, SERK2!88 and BAKI'®® constructs used the original pFastBac-1 vec- 
tor. One litre of cells (2.0 X 10° cells ml” ' cultured in the medium from Expression 
Systems) was infected with 20 ml recombinant baculovirus and the media was 
harvested after 48 h. The proteins were purified using Ni-NTA (Novagen) and 
size-exclusion chromatography (Hiload200, GE Healthcare) in buffer containing 
10 mM Bis-Tris pH 6.0 and 100 mM NaCl. The purified proteins were digested 
with endoglycosidase F1 and F3 at 18°C overnight and further cleaned using 
gel filtration. The deglycosylated DcPSKR'** and PSKR1'®® proteins were con- 
centrated to about 7.0mgml”! for crystallization. To crystallize the PSK- 
DcPSKR*®®_SERK2/ 8 complex, the purified DcPSKR'R®, SERK2!"* and the 
PSK peptide (synthesized by Scilight Biotechnology, China) were mixed and incu- 
bated at 4 °C for 20 min. The mixture was subsequently subjected to gel filtration 
(Hiload200, GE Healthcare) in buffer containing 10 mM Bis-Tris pH 6.0, 100 mM 
NaCl. The purified complex was concentrated to about 7.0 mg ml! for crystal- 
lization. Similar procedures were used for purification of the PSK-PSKR1'®*_ 
SERK1"®8 complex. 

Crystallization, data collection, structure determination and refinement. 
Crystallization experiments were performed with hanging-drop vapour-diffusion 
methods by mixing equal volumes (1.0 ll) of protein and reservoir solution at 
18°C. Good quality crystals of DcPSKR'®® were obtained in buffer containing 
0.1 M Tris pH 8.5, 2.0 M (NH4)2SOx. For crystallization of PSK-DcPSKR'*® or 
PSK-PSKR1'® complex, a mixture of DcPSKR"** or PSKR1'*® and PSK peptide 
with a molar ratio of 1:5 was used for crystallization. Diffraction quality crystals 
of PSK-DcPSKR'R® were obtained in buffer containing 0.3M KH2PO,, 20% 
PEG(2,000) within 3 days, and for PSK-PSKR1"®, good quality crystals appeared 
in buffer containing 0.1 M Bis-Tris pH 5.5, 2.0 M (NH4)2SO, within 6 months. 
Diffraction quality crystals of the PSK-DcPSKR'®*-SERK2'** complex were 
obtained in buffer containing 0.1 M sodium citrate pH 5.5, 0.4M KCl, 30% v/v 
pentaerythritol propoxylate (5/4 PO/OH) within one week, and for PSK- 
PSKR1'"*_SERK1'*®, high quality crystals emerged in buffer containing 0.1 M 
sodium acetate pH 4.5, 2.0 M (NH4)2SOx, over 6 months. All the diffraction data 
were collected at the Shanghai Synchrotron Radiation Facility (SSRF) on beam line 
BL17U1 using a CCD detector. The data were processed using HKL2000 (ref. 31). 
The crystal structure of PSK-DcPSKR'** was determined by molecular replace- 
ment (MR) with PHASER” using the structure of FLS2 (PDB code: 4MN8) as the 
initial searching model. The model from MR was built with the program COOT”’ 
and subsequently subjected to refinement by the program Phenix”. The other 
crystal structures were determined by MR using the structure of DcPSKR'®* as the 
initial searching model. All the five crystal structures were refined by the program 
Phenix™ with excellent stereochemistry (Extended Data Table 1). All the figures 
representing structures were prepared using PYMOL*. 

Microscale thermophoresis assay. The microscale thermophoresis (MST) assay 
was performed as previously described’. The affinity of the purified DcPSKR'®* 
(or its mutants) with PSK (or dPSK) was measured using the Monolith NT.115 
from Nanotemper Technologies. Proteins were fluorescently labelled according to 
the manufacturer’s protocol and the labelled protein used for each assay was about 
200 nM. A solution of unlabelled peptide was diluted for appropriate serial con- 
centration gradient. The samples were loaded into silica capillaries (Polymicro 
Technologies) after incubation at room temperature for 30 min. Measurements 
were performed at 20°C in buffer containing 20 mM citric acid pH 5.0, 50 mM 
NaCl, and 0.05% Tween 20, by using 12% LED power and 40% MST power. The 
assays were repeated three times for each affinity measurement. Data analyses 
were performed using Nanotemper Analysis software and OriginPro 8.0 software 
provided by the manufacturer. 

Gel filtration assay. The PSKR1’** and SERK1"®® proteins purified as described 
above were subjected to gel filtration analysis (Hiload200, GE Healthcare) in the 
presence or absence of PSK. The PSKR1"*®, SERK1'®® proteins and PSK with a 
molar ratio of about 1:2:3 was mixed and incubated in 4 °C for 20 min before the 
gel filtration analysis in buffer containing 10 mM Bis-Tris pH 6.0, 100 mM NaCl. 
Samples from relevant fractions were applied to SDS-PAGE and visualized by 
Coomassie blue staining. Similar procedures were used for other interaction ana- 
lysis of PSKR’®®-SERK"®®. The DcPSKR'®® and SERK2'"* mutants designed 
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to disrupt their interaction were also verified with the gel filtration assay 
described above. 

Sedimentation-velocity analytical ultracentrifugation. Sedimentation velocity 
was performed with an XL-I analytical ultracentrifuge (Beckman Coulter) 
equipped with a four-cell An-60 Ti rotor for interaction analysis of PSKR1'** 
and SERK2"** in the presence or absence of PSK at 20°C. For PSKR1‘** and 
SERKIR or BAKIERS, an eight-cell An-50 Ti rotor was used. The molar ratio of 
PSKRI'R®, SERKs"RR proteins and PSK is about 1:2:3, and the total OD 9 is about 
1.0. Buffer containing 10 mM Bis-Tris pH 6.0, 100 mM NaCl was used as the 
reference solution. All samples were applied at a speed of 45,000 rpm. Absorbance 
scans were taken at 280 nm at the intervals of 0.003 cm size in a radical direction. 
The different sedimentation coefficients, c(s), and molecular weight were calcu- 
lated by SEDFIT V14.4f software. 

Plant materials and growth conditions. Arabidopsis thaliana wild type Col-0 
and pskr1-3 (SALK_008585) were obtained from Arabidopsis Biological Resource 
Center and reported to be a null mutant’. serk 1-8, serk2-1, bak1-4 and serk1-8—/+ 
serk2-1—/— bak1-4—/— triple mutant were generously provided by J. Li and each 
single mutant has been identified to be null’. Seeds were surface sterilized for 5 
min in 20% NaClO; followed by 5 times of wash using sterile H2O and dispersed 
on 1/2 Murashige & Skoog (MS) media containing 1% agar and 10 g1* sucrose, 
pH 5.8, in Petri dish. For PSK treatment of the seedlings, PSK was added in the MS 
media to different final concentrations. The sterilized seeds were vernalized for 3 
days at 4 °C and grown for 10 days in normal condition (16 h of light/8 h of dark, 
22-23 °C). 

Generation of constructs and plant transformation. For stable transgenic 
plants, we generated the constructs of PSKR1 coding sequences with different site 
mutations by subcloning the sequences into the pDONOR207 (Invitrogen) vector 
to the destination vector pWA43 or pWA53 by gateway recombination strategy 
(for PSKR1, DcPSKR, F506A, R300A, W448A, T398L and D445A, the final target 
vector is pWA43; for F596D, S598Y, T619Y, S623Y and DcPSKR®°?-PSKR1™, 
the target vector is pWA53). pWA43 (hygromycin resistant in plants) and 
pWAS3 (kanamycin resistant in plant) contained a CaMV 35S promoter driven 
C-terminal GFP coding sequence with the recombination sites in between and 
terminated by a 35S terminator. For the constructs used for transient protoplast 
transformation, the truncated PSKR1 coding sequence (PSKR1(AKD)) with the 
kinase domain deleted was fused with a C-terminal 3x Flag affinity tag and 
inserted into the backbone of pBSK-35S: 35STerminator after digestion with 
Smal. For SERK1, SERK2, BAK1 and BAK1(T58Y) transient expression, the 
full-length coding sequences were inserted to pUC-SPYCE™ which contains a 
C-terminal haemagglutinin affinity tag after digestion with Smal. For co-expression 
in planta, SERK1, SERK2 and BAK] inserted to PSPYCE-35S (kanamycin resistant 
in plant), which contains the same framework with pUC-SPYCE, were transformed 
to the T1 generation of pWA43-PSKR1 plants in the pskr1-3 background. The 
transgenic plants were isolated by double selection on MS media containing kana- 
mycin and hygromycin. Arabidopsis was transformed with these constructs by 
Agrobacterium tumefaciens (GV3101) by the floral dip method”. 

Root length measurement and statistical analysis. For each construct, ten trans- 
genic overexpression lines in the pskr1-3 mutant background were analysed and 
three lines representative for all lines were selected to present. 10-days seedlings 
grown in the greenhouse from the lines with PSKRI transcripts detected were 
subjected to primary root length measurement from photographs using Image J 
(National Institutes of Health, http://rsb.info.nih.gov/ij). To keep consistent seed 
fitness, only newly collected seeds at the same time were used for the assay. For 
each genotype, three independent experiments were performed. Student’s t-test 
was performed to test statistical significance of means. 

PSK treatment and co-immunoprecipitation assay. Protoplast transformation 
was performed according to the reported method” and cultured for 12 h at 22 °C. 
For each transformation, the culture of the transformed protoplasts was divided 
equally into two 50 ml centrifuge tubes. PSK peptide (diluted in H.O) was added to 
the final concentration of 1.0 1M in one tube and the same volume of HO was 
added as mock treatment in the other tube. After 15 min of the treatment, the cells 
were harvested and lysed for 2 min in the lysis buffer (50 mM HEPES-KOH pH 
7.5, 0.15 M KCl, 0.001 M EDTA, 0.1% Triton-X 100, 0.001 M DTT with freshly 
added proteinase inhibitor cocktail, (Roche)). The lysate was centrifuged at 
10,000g for 10 min and the supernatant was subjected to coimmunoprecipitation 
(Co-IP) with agarose-conjugated anti-Flag antibody (Sigma-Aldrich, Cat. A220) 
for 3 h at 4°C. The agarose beads were washed with the lysis buffer for 6 times, 
diluted in 1X sample loading buffer and boiled for 5 min before SDS-PAGE. The 
following immunoblot was performed according to the standard procedure with 
anti-Flag (Sigma-Aldrich, Cat. F1804) and anti-HA antibody (Santa Cruz, Cat. sc- 
7392). For the Co-IP in planta, equal amounts of 14 days seedlings from the same 
transgenic lines (overexpressing PSKRI-GFP and SERK1/2/3-HA or PSKRI1- 
GFP alone as a negative control) were treated on the MS media supplemented 
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with 1.0 uM PSK and the MS media without PSK for 12 h. Then 5 g of the treated 
and untreated seedlings were collected and lysed for the following Co-IP experi- 
ments. GFP-trap agarose beads (ChromoTek, Cat. gta-200) were used for the 
affinity binding of the PSKR1-GFP fusion protein and anti-GFP-HRP (Miltenyi 
Biotec, Cat. 130-091-833) was used to detect the GFP epitope and anti-HA anti- 
body for HA epitope. Each Co-IP experiment was repeated at least three times. 
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Extended Data Figure 1 | Recognition mechanism of PSK by PSKRs'®* is 
highly conserved. a, Overall structure of PSK-DcPSKR™™® complex. The 
sulfated tyrosines of PSK are shown in stick. Colour codes are indicated. ID, 
island domain; N, N terminus; C, C terminus. b, Detailed interactions between 
PSK (purple) and the island domain (salmon) of DcPSKR™™®. Dashed lines 
indicate polar interactions. c, Detailed interactions between PSK and the inner 


DePSKR-LRR 
AtPSKR1—LRR 
AtPSKR2—LRR 


306 
LERLDMSEINISF SGINT PD VE IME LiQR4L titer ERO RIN IF iG IaiGIP sig 
LARLD WS ey FSGMIPDVEWME LigeiLis LG fe] T Spa Tei Tig LEAN S lg 
LYRLD MSH FSGIAIPDVEWME LigeiLis LG fe} T pS)S pl T fei bg RYN S iz 


460 


ARAL OY NOM@gelr PP pal yy ese NigL 


630 


GIZIPHCGMOFOTFPNSSFEBNSMELCGCE HRP CBee S ie tee) 
G@r GHOFPOTFPNSSFERSNESLCGE HiM9P CEBieue> S Peres 
I GEOFQTFPNSSFESNESLCGE HIS9P CEBieue> S Pere. 


9 310 320 330 
ya LNCRAMMMUL AS LD LEN EF 

MLN CWA MIMALIOS LD Leia EF 

LS GLN CHAMIM@PALING LD LiMinige 


380 410 420 


390 


DePSKR-LRR SICEP EN LPRICMRL mA iF PAOMPESFRNE BESESISNSSMANISSALIVILOHCRINLISTLVLTLNE EME INL P EMSS L al 

AtPSKR1-LRR NQR|LISE RP ASD SK eas LEYRIN Ti Cis SreeaSimSNSSMANISSALBILOHCHMNLATLVLTLNE SEE NL P eps List 

AtPSKR2-LRR NGRILIQE RPS RBS LEYBIN Tig G) SReaSMSNSS AnissaLMILoHCMNLATLVLTLNERES ABAD D SERRH 
e 


400 


e@ @ ee 
470 480 490 500 
Eee LFY LDLSNNBFMGEIPRSLTRL es RIE NAPE 
WEeeNL FY LDLSNNSFMWGE LPS LTMLS Lids moped io 
GHEPSAL EY LDLSNNSF AGE IPMS L TALIS Las eV (ol 


550 560 570 580 


SPDPPFFIAKIAN MAE LOY NOMBEIF PPTL ERNE LRICE I WiE PF Ci LiL A Vin LR INL SCT PRABLSGMTSL LOLSHIN iL 
DALAL OY NOMMSF PP pe LL es NMI LEIGID I WWE F Ci) LiL Vig L RQINRALS GRIT PRERILSGMTSL INT ta 
E ULEGISI WHE PF GiiL LA Vig L RiQiNIAL S GET P ey GMTSL LDLS BNL 


640 650 


side (cyan) of DcPSKR'®®. d, PSKRs are conserved in PSK perception and 
interaction with SERKs. Sequence alignment of the ectodomains of carrot 
DcPSKR and Arabidopsis PSKR1/2. Conserved and similar residues are boxed 
with red ground and red font, respectively. Residues involved in recognition 
of PSK and interaction with a SERK member are indicated with blue solid 
circles and squares at the bottom, respectively. 
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Extended Data Figure 2 | Mutagenesis analysis of PSKR recognition of PSK 
and PSKR-SERK interaction. a, Sulfation enhances PSK interaction with 
DcPSKR"®®, Quantification of binding affinity between DcPSKR'™® and PSK 
or the desulfated peptide (dPSK) by MST (MicroScale Thermophoresis). Data 
points indicate the difference in normalized fluorescence (%o) generated by 
PSK or dPSK binding DcPSKRUR protein, and curves indicate the calculated 
fits. Error bars represent standard error of 3 independent measurements. 

b, Mutagenesis analysis of DcPSKR'®® by MST. Quantification of binding 


affinity between WT DcPSKR"®® or various mutants as indicated and PSK by 
MST. Error bars represent standard error of 3 independent measurements. 

c, pskr1-3 plants transformed with mutated PSKR1 which compromised PSK or 
SERKs binding are less responsive to PSK than wild type or pskr1-3 
transformed with PSKRI. The line was the same as that used in Fig. le and 4d. 
Average (+s.e.m.) primary root lengths of seedlings were determined in 
three independent experiments with 30 seedlings analysed per genotype in the 
presence or absence of 1.0 uM PSK. 
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Extended Data Figure 3 | PSK binding induces no oligomerization of 
PSKR™®, Shown on the top is superposition of the gel filtration 
chromatograms of the PSKR1®® (left) or DcPSKR™ (right) protein in the 
absence (grey) and presence (red) of PSK. The vertical and horizontal axes 


LETTER 


DcPSKRLRR 
os | DcPSKRLRR+PSK 


iJ 
o 
1 


Asgo (MAU) 
3 
1 


104 


50 60 70 80 
elution volume (mL) 


71 72 73 74 75 76 78 80 M 


DcPSKRLRR 


DcPSKRLRR 


represent ultraviolet absorbance (A = 280 nm) and elution volume (ml), 
respectively. Bottom, Coomassie blue staining of the peak fractions shown on 
the top following SDS-PAGE. M, molecular weight ladder (kDa). 
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Extended Data Figure 4 | PSK induces PSKR1'™® or DcPSKR™™* 
interaction with SERK members in gel filtration. a, PSK induces PSKR1**®- 
SERK2'** heterodimerization. Right, Coomassie blue staining of the peak 
fractions shown on the left following SDS-PAGE. M, molecular weight ladder 
(kDa). b, PSK induces PSKR1'®* heterodimerization with BAK1'*®. The assay 
was performed as described in a. c, PSK induces DcPSKR'** 
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heterodimerization with SERK1™™®. The assays were performed as described in 
a. The red and black arrows indicate the elution position of PSK-DcPSKR'**- 
SERK1®® and the retention volumes of molecular weight markers, 
respectively. d, PSK induces DcPSKR™* heterodimerization with SERK2'®®, 
The assay was performed as described in a. 
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Extended Data Figure 5 | PSK induces PSKR1™™* interaction with SERK nature of PSKR1™** may confer to the slight difference of calculated molecular 
members in sedimentation-velocity analytical ultracentrifugation. PSK weights. PSK induced the formation of a monomeric PSK-PSKR1***- 
induces PSKR1**-SERK2"®® (left panel) or PSKRI™®*®-BAK1'®® (right SERK2'®* or PSK-PSKR1**-BAK1'** complex, leading to the shift of 
panel) interaction in sedimentation-velocity analytical ultracentrifugation PSKRI'®* to a higher S. 

assays. The assays were performed as described in Fig. 2b. The glycoprotein 
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Extended Data Figure 6 | SERK members function redundantly in PSK- or double SERK knockout plants only showed slightly shortened roots 
induced plant growth. a—c, Average (+s.e.m.) primary root lengths of compared to the triple mutants. Asterisks within the bars indicate significant 
seedlings were determined for the wild-type or SERK knockout Arabidopsis difference between the wild type and SERK knockout mutants and those above 
plants grown for 10 days on plates with (red) or without (blue) 1.0 1M PSK. __ the bars indicate significant difference between different SERK knockout 
Three independent experiments per genotype with 30 seedlings were mutants. Each genotype in the presence and absence of PSK is compared in 
performed. The statistics are shown in a, b and c. All the genotypes are c. Student’s t-test, *P < 0.05, **P < 0.01, ***P < 0.001. NS, non-significant 
compared in the absence of PSK ina and in the presence of PSKinb. The single (P > 0.05). 
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Extended Data Figure 7 | Different mechanism of PSK induced PSKR- 
SERK interaction compared to BRI1- BAK1 or FLS2-BAK1 complex. 

a, Overall structure of PSK-DcPSKR'®®-SERK2/ 8 complex. b, Structural 
comparison of PSK-PSKR1"®®_SERK1'® and brassinosteroid-BRI1**- 
BAK1®®. The structure of PSKR1"™® (residues 77-634) was used as the 
template for alignment with that of BRI1 (residues 174-766; PDB code 4M7E) 
with a rm.s.d. of 2.43 A. c, Structural comparison of PSK-PSKRURR_ 
SERK1'®® and flg22-FLS2'*®*®_BAK1"®®. The structure of PSKR1'®® (residues 
82-554) was used as the template for alignment with that of FLS2 (residues 79- 
509; PDB code 4MN8) with a r.m.s.d. of 4.4 A. SERK1'®® bound by PSKR1'8® 
rotates about 30 degrees and shifts about 20 A relative to the BAK1'®®-bound 
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FLS2!8®. d, Electron density around the island domain of DcPSKR!R and 
PSK-bound DcPSKR'®® in the finally refined structures. Top panel, electron 
density 2F, — F, (left) and F, — F, (right) contoured at 1.30 sigma and 2.7 
sigma, respectively, for the finally refined free DcPSKR™™* structure. Bottom 
panel: electron density 2F, — F, (left) and F, — F- (right) omitted around the 
island domain in the structure of PSK-bound DcPSKR'®®, The island domain 
(residues 511-535) and the B—sheet (residues 474-480, 450-456, 427-432, 
402-408, 376-381 and 352-357) interacting with the ID were not included in 
refinement and electron density calculation. All the deleted residues are shown 
in pink. The marker residue proline 536 is shown in red. 
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Extended Data Figure 8 | Mutagenesis analysis of DCPSKR'®*-SERK2'"® _ involved in PSKRs interaction are conserved. Sequence alignment of the 
interaction. a, Superposition of the gel filtration chromatograms ofthe mutant —_ ectodomains of SERK family proteins. Conserved and similar residues are 


DcPSKR"®® and SERK2'®® proteins in the presence of PSK. The assays boxed with red ground and red font, respectively. Residues involved in 
were performed as described in Extended Data Fig. 4a. b, Coomassie blue interaction with PSKR are indicated with blue solid squares at the bottom. 
staining of the peak fractions shown on the left chromatograms following The sequence of SERK3 is 100% identical to BAK1. 


SDS-PAGE. M, molecular weight ladder (kDa). c, The amino acids of SERKs 
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Extended Data Table 1 | Data collection and refinement statistics 


Data set 


Wavelength (A) 


Resolution (A) 


Space group 
a, b, c (A) 
a,B,y( ) 
Unique 


reflections 


Completeness 
Rsym (%) 
redundancy 


1/5 


DcPSKR'R® 


1.000 
99.0-2.9 
(2.95-2.9) 
0222; 
90.0,98.8, 227.3 
90.0, 90.0, 90.0 
21,448(1,019) 


93.6% (92.6%) 
12.6(47.4) 
5.3(5.3) 
18.5(6.4) 


Statistics for refinement 


Resolution (A ) 


No. of RFs 
Completeness 
Rwork/Rfree 
(%) 

R.m.s.d 

Bond (degree) 
length (A) 
Ramachandran 


Plot 


99-2.9 
(3.03-2.9) 
21,401(2,391) 
93.4% 

20.2 (24.9)/ 
25.4(32.6) 


1.418 

0.008 

Favored: 96.3% 
Allowed: 3.7% 
Outliers: 0.0% 


PSK-DcPSKR'** 


1.000 
99.0-2.2 
(2.24-2.2) 

P1 

66.7, 75.7, 93.9 
111.3, 105.7, 97.2 
78,850(2,566) 


97.3% (87.0%) 
7.4(54.9) 
3.9(3.9) 
17.9(5.9) 


99.0-2.2 
(2.23-2.20) 
78,850(2,566) 
97.1% 
21.2(29.0)/ 
26.2(33.7) 


1.389 

0.008 

Favored: 98.1% 
Allowed: 1.8% 
Outliers: 0.1% 


PSK-DcPSKR'**- 
SERK2'* 

1.000 

99.0-2.75 

(2.8-2.75) 

c2 

486.2, 73.5, 67.3 

90.0, 95.8, 90.0 

54,320(2,762) 


88.8% (87.0%) 
7.9(40.9) 
2.0(1.9) 
14.7(3.1) 


99.0-2.75 
(2.8-2.75) 
54,303(2,613) 
87.7% 
20.0(24.2)/ 
26.2(38.3) 


1.445 

0.008 

Favored: 96.6% 
Allowed: 3.3% 
Outliers: 0.1% 


PSK-PSKR1'** 


1.000 

99.0-2.5 
(2.54-2.5) 

P4322 
92.9,92.9, 242.5 
90.0, 90.0, 90.0 
36,529(1,779) 


98.4% (99.6%) 
10.3(56.6) 
5.3(5.7) 
26.0(4.4) 


99.0-2.5 
(2.58-2.5) 
36,369(2,715) 
98.1% 
22.9(26.7)/ 
27.6(35.1) 


1.481 

0.008 

Favored: 96.6% 
Allowed: 3.3% 
Outliers: 0.1% 


LETTER 


LRR 


PSK-PSKR1 
SERK1'* 
1.000 
99.0-2.65 
(2.7-2.65) 
C222, 
152.5, 220.9, 105.4 
90.0, 90.0, 90.0 


49,329(2,490) 


96.8% (98.8%) 
10.0 (48.4) 
4.2(4.2) 
15.5(2.65) 


99.0-2.66 
(2.71-2.66) 
49,253(2,282) 
95.7% 
20.0(28.6) / 
24.6(36.7) 


1.265 

0.008 

Favored: 89.6% 
Allowed: 9.4% 
Outliers: 1.0% 


RF, reflection. Rsym = Sop Sins - Ih|/Xh Y}j Inj, where /;, is the mean intensity of the /obervations of symmetry related reflections of h.R = 37 |Fobs — Featc|/ >> Fobs, Where Fobs = Fp, and Feaic is the calculated protein 
structure factor from the atomic model. R.m.s.d. in bond lengths and angles are the deviations from ideal values. 
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Structural basis of JAZ repression of MYC 
transcription factors in jasmonate signalling 


Feng Zhang'**, Jian Yao”**, Jiyuan Ke'*, Li Zhang”, 


Joseph Brunzelle®, Patrick R. Griffin®, Mingguo Zhou’, H. Eric Xu? 


The plant hormone jasmonate plays crucial roles in regulating plant 
responses to herbivorous insects and microbial pathogens and is an 
important regulator of plant growth and development’. Key med- 
iators of jasmonate signalling include MYC transcription factors, 
which are repressed by jasmonate ZIM-domain (JAZ) transcrip- 
tional repressors in the resting state. In the presence of active jasmo- 
nate, JAZ proteins function as jasmonate co-receptors by forming a 
hormone-dependent complex with COI, the F-box subunit of an 
SCF-type ubiquitin E3 ligase*"’. The hormone-dependent forma- 
tion of the COI1-JAZ co-receptor complex leads to ubiquitination 
and proteasome-dependent degradation of JAZ repressors and 
release of MYC proteins from transcriptional repression*®’*”. The 
mechanism by which JAZ proteins repress MYC transcription fac- 
tors and how JAZ proteins switch between the repressor function in 
the absence of hormone and the co-receptor function in the presence 
of hormone remain enigmatic. Here we show that Arabidopsis 
MYC3 undergoes pronounced conformational changes when bound 
to the conserved Jas motif of the JAZ9 repressor. The Jas motif, 
previously shown to bind to hormone as a partly unwound helix, 
forms a complete a-helix that displaces the amino (N)-terminal helix 
of MYC3 and becomes an integral part of the MYC N-terminal fold. 
In this position, the Jas helix competitively inhibits MYC3 inter- 
action with the MED25 subunit of the transcriptional Mediator 
complex. Our structural and functional studies elucidate a dynamic 
molecular switch mechanism that governs the repression and activa- 
tion of a major plant hormone pathway. 

To understand the structural basis of the interactions between MYC 
transcription factors and JAZ repressors, we first used yeast two- 
hybrid assays to determine the JAZ-binding regions within MYC2, 
MYC3 and MYC4. A conserved region of ~200 amino acids (amino 
acids 55-259, 44-234 and 55-253 in MYC2, MYC3 and MYC4, 
respectively) within the N termini of all three proteins that encom- 
passes the previously defined JAZ-interacting domain (JID)'*"* and 
the transcription activation domain (TAD)’*”* was sufficient to inter- 
act with JAZ9 (Extended Data Figs 1a and 2a). Similarly, we identified 
a region of 17 amino acids within the Jas motif of JAZ9 (polyA-Jas) that 
is required and sufficient to interact with MYC3 (Extended Data Fig. 1b). 
Interestingly, this Jas motif shares the same segment of JAZ proteins that 
interacts with COI] (ref. 16), but is four amino acids shorter at the N 
terminus (Extended Data Fig. 1c). We confirmed these results using 
AlphaScreen luminescence proximity assays with His6-tagged MYC pro- 
teins and biotinylated JAZ8, JAZ9 and JAZ12 peptides (Extended Data 
Figs 1d and 2b). 

On the basis of our mapping results, we generated 15 MYC2/3/4 
N-terminal truncated proteins of various lengths (Extended Data 


Vinh Q. Lam’, Xiu-Fang Xin’, X. Edward Zhou’, Jian Chen"’, 


, Karsten Melcher! & Sheng Yang He? oe 


Figs 1d and 2b). MYC3(44-238) and MYC3(5- 242) yielded high- 
quality crystals that diffracted X-rays to 2.2 A and 2.1 A resolution, 
respectively (Extended Data Table 1). We solved the structure 
of selenomethionine-modified MYC3(44-238) by selenium single- 
wavelength anomalous diffraction (Se-SAD) and the structure of 
MYC3(5-242) by molecular replacement using the structure of 
MYC3(44-238) as search model (Fig. la, b and Extended Data 
Fig. 3). The proteins formed a helix-sheet-helix sandwich fold, in 
which eight a-helices are wrapped around a central five-stranded anti- 
parallel B-sheet (Fig. 1a). Remarkably, while a hallmark of acidic TAD 
is that they are unstructured when not bound to a target in the tran- 
scriptional machinery'”-"’, the MYC3 TAD is well resolved and forms 
a loop-helix-loop-helix motif that packs against the JID with the 
N-terminal TAD helix and against B-strands 3-5 with the carboxy 
(C)-terminal TAD helix (Fig. la, b and Extended Data Fig. 3). To 
our knowledge, this is the first example in which a non-complexed 
acidic TAD has a well-resolved structure. The JID consists of the top 
(B2) strand of the B-sheet, the long «3-helix and two unresolved link- 
ers (Fig. 1a, b and Extended Data Fig. 3a). In MYC3(5-242), the JID 
forms together with the o4-helix of the TAD groove. The N-terminal 
MYC helix (a1) is connected by a sharp ~90° kink to a loop 
that adopts a partial, stretched-out helical conformation («1', amino 
acids 6-16) that occupies the groove formed by the JID and TAD 
to cap the central B-sheet (Fig. la and Extended Data Fig. 3a). In 
N-terminally truncated MYC3 (MYC3(44-238), which lacks 
a1’ + a1), the JID rearranges to adopt a position similar to that of 
a1’ in MYC3(5-242) to substitute for 01’ to cap the B-sheet in the fold 
(Fig. 1b). We performed hydrogen deuterium exchange (HDX) experi- 
ments to detect the surface accessibility and structural dynamics of 
MYC3(5-242) in solution (Extended Data Fig. 4). While the central 
B-sheet has a very stable structure and is well protected from deuter- 
ium exchange, the 1/01’ helix region has a very high deuterium 
exchange rate, suggesting that it has a very dynamic structure and 
forms only transiently in solution. This is consistent with the high 
B-factor values of the 01/01’ helix in the MYC3(5-242) crystal struc- 
ture (Extended Data Fig. 5). While peptides corresponding to the JID 
helix were not resolved in HDX experiments, the JID helix also has a 
high B-factor (Extended Data Fig. 5), indicating that its position is 
dynamic as well. The MYC3(5-242) and MYC3(44-238) apo crystal 
structures therefore probably represent structure snapshots of two or 
more alternative MYC3 conformations in solutions. 

To crystallize a complex between MYC and the Jas motif, we syn- 
thesized a set of nine JAZ8, JAZ9 and JAZ12 Jas peptides of different 
lengths and complexed them with the above-mentioned set of 
the MYC N-terminal proteins. After extensive trials, we succeeded 
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Figure 1 | Structures of MYC3 N terminus in apo and Jas peptide-bound 
states. a, Apo MYC3(5-242). b, Apo MYC3(44-238). c, MYC3(5-242) bound 
to the 22-amino-acid Jas” motif peptide. Blue, JID; green, TAD; pink, Jas 
peptide. Dotted lines indicate the unresolved linkers that flank «1'/o«1 and 
the JID helices. d, MYC3(44-238) bound to the 22-amino-acid Jas!“ 

motif peptide. e, f, Overlay of the MYC3(44-238)-JAZ9 complex with apo 
MYC3(5-242) (e) and apo MYC3(44-238) (f). The complex structure is shown 
in colour overlaid on the apo structures in grey. Binding of the JAZ9 Jas22 
helix (pink) displaces the «1' helix to become an integral part of the fold. 


in obtaining crystals for MYC3(44-238) in complex with a 
22-amino-acid Jas peptide (S218-M239) from JAZ9 (Jas22)47°), 
However, no crystals for MYC3(5-242)-Jas peptide complexes 
could be obtained, suggesting that Jas complexes with «1'/o1 helix- 
containing MYC3 are less stable and/or conformationally more 
dynamic. To test this hypothesis, we generated a covalent fusion 
between MYC3(5-242) and Jas22/4”° separated by a 12-amino-acid 
flexible linker to increase the stability of the MYC3(5-242) and 
Jas22)4”° complex and reduce conformational flexibility. This fusion 
protein formed high-quality crystals and allowed us to solve the struc- 
ture of the MYC3(5-242)-Jas22/4” complex (Fig. 1c) at a resolution 
of 2.4A (Extended Data Table 1). The most striking aspect of the 
MYC3(5-242)-Jas22)“” complex is that the Jas peptide formed a 
single, continuous helix that displaced the dynamic «1'/a1 helix in 
apo MYC3(5-242) (Supplementary Video 1). Correspondingly, the 
JID helix rearranged its conformation and the displaced «1'/o1 helix 
became almost completely disordered, suggesting the increase in dis- 
order as the likely reason for the recalcitrance of this complex to 
crystallize. The Jas helix adopted a position in the groove that is almost 
superimposable with that of «1’ (Fig. le). In this position the Jas helix 
is nestled between the «4-helix of the TAD and the strand and helix of 
the JID to make extensive interactions with both the JID and the TAD 
and to become an integral part of the structural fold. In addition, we 
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also determined the structure of MYC3(44-238), lacking the «1’/a1 
helix, in complex with Jas22/4” peptide at a resolution of 1.95 A 
(Extended Data Table 1). In this structure, the Jas peptide also adopted 
a helical conformation and binds to the JID/TAD groove in the same 
way as seen in the MYC3(5-242)-Jas22)“”" fusion complex (Fig. 1d, f). 
Both o1'/x1 and the JID helix have long, unresolved linkers that 
appear to provide flexibility for their displacements/rearrangements 
by the Jas helix. Together, our analyses of the MYC3 apo and MYC3- 
JAZ complex structures indicate that occupancy of the groove 
(either by a1’ or the JID helix in apo MYC3, or by the Jas helix 
in the MYC-JAZ complex) is critical for formation of the overall 
MYC N-terminal fold. 

In the JAZ9-MYC3 complex, the JAZ peptide forms five main 
interaction networks with the TAD-JID surface (Fig. 2 and 
Extended Data Table 2): (1) R234 of JAZ9 forms salt bridges 
and hydrogen bonds with three glutamate residues (E142, E143 
and E148) of the TAD; (2) R229 and S226 form charge and hydrogen 
bond interactions with JID D94 and W92; (3) L227 and L231 form a 
core hydrophobic network with TAD L152 and JID 1122 and L125; 
(4) F230 interacts with JID Y97 and TAD F151 in an aromatic ring 
network; and (5) R223 and L227 have both hydrogen bond and/or 
hydrophobic interactions with TAD M155. In contrast, the first 
five amino acids (SVPQA) of the Jas motif that are critical for the 
co-receptor function of JAZ proteins (that is, hormone-dependent 
binding to COI1 (ref. 16)) made no critical interactions with MYC3, 
consistent with our yeast two-hybrid and AlphaScreen data (Extended 
Data Figs 1b, d and 2b). Consistent with the structural data, mutational 
analysis showed that key interface residues of Jas, JID and TAD have 
important roles in MYC-JAZ interactions (Fig. 3a, b and Extended 
Data Fig. 6). In addition, the structure also provides an explanation for 
the partial in vivo relief of MYC3 repression by the MYC3?4N muta- 
tion observed previously”, as MYC3?""N lost interaction with a subset 
of JAZs, including JAZ3, JAZ4 and JAZ9 (Extended Data Fig. 6). 

Next, we transfected the MYC-responsive pJAZ2::GUS reporter” 
together with wild-type and mutant MYC3 expression plasmids into 
Arabidopsis protoplasts. As shown in Fig. 3c, mutant MYC3 proteins 
that were defective in interaction with multiple JAZ proteins 
(Extended Data Fig. 6) were partly relieved in repression (that is, 
increased reporter gene activity). Moreover, the extent at which muta- 
tions compromised MYC3 interactions with JAZ proteins correlated 
with the increase in reporter gene activity and the magnitude of 
changes in reporter gene activity could be further accentuated by 
expressing MYC3 mutant proteins from the strong cauliflower mosaic 
virus 35S promoter in coi1-30 mutant protoplasts, in which all JAZ 
repressors are presumably stabilized (Fig. 3d). Together, these data 
validate the MYC3-JAZ9 complex structure and provide strong evid- 
ence that amino-acid interactions identified in the MYC3-JAZ9 com- 
plex structure are important for MYC3 repression in planta. 

The Jas motif is required for its repressor function through interaction 
with MYC but also for its co-receptor function through interaction with 
CON (ref. 22). While the Jas’"”” peptide in the MYC3 complex formed a 
continuous helix (Fig. 1c, d), representing the rest state of JAZ, the Jas421 
peptide in the previously determined COI1-jasmonate-Ile-Jas co- 
receptor structure (Protein Data Bank accession number 3OGL) adopted 
a bipartite conformation with an N-terminal part stretched to form a 
distinct loop region followed by a shorter C-terminal helix’, as illu- 
strated by the structural alignment in Fig. 4a, b and Supplementary 
Video 2. In addition to the Jas\““°-MYC3 complex, we solved the struc- 
ture of the Jas’““"-MYC3 complex. As shown in the structure alignment 
in Extended Data Fig. 7, the Jas helices of JAZ9 and JAZ1 overlap very 
well, confirming that the Jas conformational change between MYC- 
bound (resting stage) and COI1-bound (hormone-activated stage) is 
probably common in MYC interaction with different JAZ transcrip- 
tional repressors. 

In the Jas/““"-COI complex, the loop region of the Jas!“”" helix 
is formed by the five moderately conserved N-terminal amino 
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Figure 2 | Jas peptide forms extensive interactions with the JID-TAD 
surface in MYC3. JID-TAD-Jas structure with important interacting residues 
shown in stick presentation. Details of key interaction networks are shown as 
solid boxes (hydrophobic interaction networks) or dashed boxes (charge 
interaction networks). For clarity, not all interacting residues are shown in the 
detail boxes, compared with the non-boxed overview figure. 


acids of the Jas motif (Extended Data Fig. 1c) that directly interact 
with the jasmonate-Ile hormone (Fig. 4a, b) and is required for 
Jas—jasmonate-Ile-COI1 co-receptor complex formation'®. When we 
mutated the corresponding N-terminal amino acids of JAZ9 to alanine 
(JAZ9-4A and JAZ9-AA; Fig. 4c), JAZ9 lost interaction with COI in 
yeast two-hybrid assays, but not with MYC3 (Fig. 4d), consistent with the 
MYC3 complex structure. Mutations in the middle of the Jas motif 
(S226A-R234A) affected binding to both MYC3 and COI], albeit to 
different degrees. In addition, residues that are C-terminal to the Jas 
motif enhance JAZ9 interaction with COI in yeast two-hybrid assays, 
but are not critical for its interaction with MYC3 (Fig. 4e, f), which is 
consistent with a previous study of JAZ2, JAZ3 and JAZ10 interactions 
with COI and MYC2 (ref. 23). Together, these results indicate that COI 
and MYC3 potentially compete for binding to the central part of the Jas 
motif, but that COI] makes additional critical interactions with JAZ9 
outside the MYC3-interacting region, including the previously unrecog- 
nized hormone-dependent unwinding of the N-terminal helix of the Jas 
motif (Supplementary Video 2). These additional interactions may allow 
COI] to drive JAZ ubiquitination and dissociation of the extensive JAZ- 
MYC interaction upon jasmonate-Ile stimulation. 

MED25 is a subunit of the Mediator complex that recruits RNA 
polymerase II to the promoters of jasmonate-responsive genes” and is 
required for various jasmonate responses’**”, including Arabidopsis 
susceptibility to Pseudomonas syringae bacterial infection (Extended 
Data Fig. 7c) and jasmonate-induced inhibition of Arabidopsis root 
growth (Extended Data Fig. 7d). We found that MYC3(44-238) also 
directly binds MED25 and that a fragment (amino acids 540-680) 
encompassing the MED25 activator interaction domain (ACID) is 
sufficient to bind to MYC3 (Fig. 5a), analogous to what has previously 
been reported for MYC2 (ref. 13). Since the MYC3 TAD makes critical 
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Figure 3 | Mutational analysis of the JAZ9-MYCS3 interaction. a, Yeast two- 
hybrid analysis of the interaction between JAZ9 mutant proteins and wild-type 
MYC2, MYC3 and MYC4 proteins. JAZ9““*“ contains $218A, V219A, P220A 
and Q221A mutations, and JAZ9*4 contains R223A and K224A mutations. 
The experiment was repeated three times with same results. DB, Y2H bait 
vector carrying LexA DNA binding domain. AD, Y2H prey vector carrying the 
B42 activation domain. b, Yeast two-hybrid analysis between wild-type JAZ9 
protein and MYC3 proteins with mutations in the JID (left) or TAD (right). 
The experiment was repeated three times with same results. c, d, Alanine 
replacements of JAZ9-interacting amino acids of MYC3 increase MYC3 target 
gene expression in wild-type Col-0 and coil-30 mutant plants, respectively. 
Arabidopsis protoplasts from wild type (c) or coi1-30 mutants (d) were 
transfected with an MYC3-responsive JAZ2::GUS reporter together with yellow 
fluorescent protein (YFP) alone or MYC3-YFP constructs under control of 
the native MYC3 (c) or cauliflower mosaic virus 35S (d) promoter as indicated. 
A 35S::LUC reporter construct was co-transfected as a control. GUS activities 
were normalized to the luciferase activity. Data shown are means of four 
independent transfections (n = 4 biological replicates; error bars, s.d.). Different 
letters above the columns indicate the significant differences from each other 
(P< 0.05) in pJAZ2:Gus reporter activities from transient expression of 
indicated MYC3 variants, as determined by Tukey—Kramer multiple compari- 
son analysis. DY, the D94A/Y97A double mutant; LM, the L152A/M155A 
double mutant. The experiment was repeated three times with similar results. 


interactions with JAZ repressors and is required for MYC3-JAZ9 
complex formation (Figs 2 and 3b and Extended Data Fig. 6), we 
explored the intriguing possibility that MYC3 binding of JAZ9 
and MED25 is mutually exclusive. To test this prediction in a 
defined system, we performed AlphaScreen interaction assays between 
MED25(407-680) and both MYC3(44-238) and MYC3(5-242) in 
the presence of increasing amounts of untagged Jas22)“”’ peptide. 
As shown in Fig. 5b, the JAZ peptide competitively inhibited the 
MYC-MED25 interaction with an IC;) of ~420nM (MYC3(44-238)) 
and ~490nM (MYC3(5-242)). We further tested competition 
in planta by transiently expressing combinations of tagged MED25, 
JAZ9 and MYC3 in Nicotiana tabacum leaves. As shown in Fig. 5c, 
co-immunoprecipitation of MED25 with MYC3 was strongly reduced 
upon co-expression of JAZ9. Together, these results demonstrate that 
the Jas motif of JAZ proteins and the ACID domain of MED25 prob- 
ably bind to a shared MYC3 surface, and that JAZ repressors can 
compete with MYC3 for interaction with MED25 (and possibly other 
co-activators) in vitro and in planta. 

In the past decade, despite the identification of analogous hormone 
perception and transcriptional gene regulation that underpins several 
hormone signal transduction pathways in plants”, no crystal struc- 
tures of the transcriptional-repressor-transcription-factor complexes 
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Figure 4 | Distinct conformations of the Jas helix in the COI1-JAZ co- 
receptor complex versus the JAZ-MYC complex. a, Structure overlay of the 
MYC3-Jas’“”’ complex with the COI1/Ask-jasmonate-Ile-Jas’“”" complex 
(Protein Data Bank accession number 30GL). b, Close up of the Jas motif 
overlay from the MYC3-Jas’“”” complex (pink) and from the COI/Ask- 
jasmonate-Ile-Jas’*” complex (blue). c, Alignment of the Jas motifs from 
JAZ1 and JAZ9. Mutationally analysed conserved single residues and 
N-terminal amino-acid stretches in JAZ9 are highlighted in bold. The position 
and sequence of JAZ9 fragments are indicated below the simplified JAZ9 
diagram. d, Yeast two-hybrid analysis of the interaction between JAZ9 and 
either COI (top) or MYC3 (bottom). The experiment was repeated three times 
with the same results. e, Yeast two-hybrid analysis of the interaction between 
JAZ9 C-terminal fragments and either COI1 (top) or MYC3 (bottom). One 
micromolar coronatine (COR) was used in yeast two-hybrid assays for the 
COII1-JAZ interaction. The experiment was repeated three times with similar 
results. f, Quantitative yeast two-hybrid analysis of the interaction between 
JAZ9 truncations and COI1, with B-galactosidase reporter gene activity 
determined by Beta-Glo assay (n = 3 biological replicates; error bars, s.d.). 
Different letters above the columns indicate significant differences from each 
other (P < 0.05) in COI] interaction with indicated JAZ9 fragments at a given 
concentration of COR (that is, 1 14M or 10 11M), as determined by two-way 
analysis of variance (ANOVA) with Bonferroni post-test. The experiment 
was repeated three times with similar results. 


have been solved. The crystal structure of the MYC-JAZ complex 
reported here therefore provides the first structural insight into 
the mechanism of transcriptional repression in plant hormone signal- 
ling. Our structural, biochemical and in planta analyses suggest 
that JAZ repressors use a novel dual repression mechanism, which 
involves not only epigenetic modifications of the target gene chro- 
matin structure through TOPLESS co-repressors, as demonstrated 
previously’’, but also direct inhibition of MYC binding to MED25 
(and possibly other co-activators), as an integral part of a mechanism 
of preventing transcriptional activation of jasmonate response genes 
(Extended Data Fig. 8). In addition, we have discovered distinct 
JAZ conformations in the MYC-JAZ resting complex versus the 
JAZ-COI1 hormone-activated complex"’, providing the first struc- 
tural insight into the switch mechanism between transcriptional 
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Figure 5 | Jas motif peptide competitively inhibits the MYC3-MED25 
interaction. a, MYC3 interacts with the ACID domain of MED25. Note that 
sequences N-terminal to ACID contribute to binding and/or stability. 
AlphaScreen interaction assay between biotinylated MYC3(44-238) and 
His6Sumo-MED25 fragments (n = 3 technical replicates; error bars, s.d.). 
***Significant differences (P < 0.001) compared with the no-MYC control by 
Student’s t-test. VWF-A, von Wildebrandt factor A domain (responsible for 
mediator binding); MD, middle domain; Q, Q-rich C-terminal region. The 
experiment was repeated three times with similar results. b, Jas‘“”” peptide 
competes with MED25 for MYC3 binding. AlphaScreen competition assay 
(n = 3 technical replicates; error bars, s.d.). Jas22)4”°: untagged JAZ9(218- 
239). MED25: His6Sumo-MED25(407-680). MYC3(5-242): biotin-MYC3 
(5-242). MYC3(44-238): biotin-MYC3(44-238). The experiment was 
repeated three times with similar results. An enlarged version of b with visible 
error bars is shown in Supplementary Fig. 1, and associated original data in 
Supplementary Table 1. c, Interference of the MYC3-MED25 interaction by 
JAZ9 in planta. Flag-MED25 (+), haemagglutinin epitope tag (HA)-JAZ9 (+) 
and YFP-MYC3 (+), and respective vector controls carrying Flag, HA or YFP 
tags (—) under control of the cauliflower mosaic virus 35S promoter were 
transiently expressed in N. tabacum leaves. Protein extracts were 
immunoprecipitated (IP) with an anti-YFP antibody and analysed by western 
blot (WB) with HA-, Flag-, or YFP-specific antibodies. The experiment was 
repeated three times with similar results. The original blots from which the 
images were cropped are shown in Supplementary Fig. 2. 


repression and hormone-dependent transcriptional activation in a 
major plant hormone signalling pathway. 
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METHODS 


Protein preparation. Wild-type MYC3(44-238) was expressed as a fusion pro- 
tein with a cleavable N-terminal His6Sumo tag from a modified pSUMO 
(LifeSensors) expression vector. BL21 (DE3) cells transformed with the expression 
plasmid were grown in Luria-Bertani broth at 16 °C to an absorbance (Agoo nm) of 
~1.0 and induced with 0.1 mM IPTG for 16h. Cells were harvested, resuspended 
in 100 ml extract buffer (20 mM Tris, pH 8.0, 200 mM NaCl, and 10% glycerol) per 
61 of cells and passed three times through a French press with pressure set at 
1,000 Pa. The lysate was centrifuged at 40,900g in a Sorvall SS34 rotor for 30 min, 
and the supernatant was loaded on a 50 ml nickel HP column. The column was 
washed with 600 ml of 10% buffer B (20 mM Tris, pH 8.0, 200 mM NaCl, 500 mM 
imidazole and 10% glycerol) and eluted with 200 ml of 50% buffer B, followed by 
100 ml of 100% buffer B. The eluted His6Sumo-MYC3(44-238) was dialysed 
against extract buffer and cleaved overnight with SUMO protease at a protease:- 
protein ratio of 1:1,000 at 4 °C. The cleaved His6Sumo tag was removed by passing 
through a 5 ml nickel HP column, and the protein was further purified by chro- 
matography through a HiLoad 26/60 Superdex 200 gel filtration column in 25 mM 
Tris, pH 8.0, 200 mM ammonium acetate, 1 mM dithiothreitol and 1 mM EDTA. 
To prepare the protein-ligand complex, we mixed Jas22)4”", Jas22/A“*, Jas22/4” 
or Jas22)4“"? peptides with purified MYC3(44-238) proteins at a 1.5:1 molar ratio. 
The expression and purification of MYC3(5-242) followed the same method as 
for MYC3(44-238) described above. To prepare the protein-ligand complex, we 
mixed Jas22)471, Jas22/478, Jas22A” or Jas22)42!2 peptides with purified 
MYC3(5-242) proteins at a 1.5:1 molar ratio. To prepare MYC3(44-238) seleno- 
methionyl (Se-Met) protein for phase determination, we followed the same 
methods as described previously”*. Purification of Se-Met MYC3(44-238) pro- 
teins followed the same protocol as for MYC3(44-238) native protein except 
that the procedure was performed more quickly to avoid protein oxidization. 
The MYC3(5-242)-Jas22/4” complex was constructed as a fusion protein 
with His6Sumo-MYC3(5-242) at the N terminus and Jas22/4” at the 
C terminus, separated by a flexible GSAGSAGSAGSA (4XGSA) linker 
(His6Sumo-MYC3(5-242)—4 X GSA-Jas22)4”"). The expression and _purifica- 
tion of the fusion protein followed the same methods as for MYC3(44-238). 
Small-scale purification of His6Sumo-tagged MYC2/3/4 protein fragments 
(including JID and TAD domain) for binding studies with Jas peptides followed 
the same methods as for MYC3(44-238), except that the His6Sumo tag was 
not removed. Small-scale purification of His6Sumo-tagged MED25 protein 
fragments (including the ACID domain) for binding studies with biotinylated 
MYC3(44-238) followed the same methods as for MYC3(44-238), except 
that the His6Sumo tag was not removed. To express and purify biotinylated 
MYC3(44-238) and MYC3(5-242) protein for binding studies (Fig. 5a) and JAZ 
Jas competition assays (Fig. 5b), we followed the methods described previously”. 
Crystallization. The apo-MYC3(5-242) crystals were grown at 20°C in sitting 
drops containing 0.2 ul of purified MYC3(5-242) protein at a concentration of 
10mgml * and 0.2 ul of well solution containing 0.2M magnesium chloride, 
0.1M Tris, pH 8.5, 30% (w/v) polyethylene glycol 4000 for 3 days. The Se-Met 
MYC3(44—238) crystals were grown at 20 °C in sitting drops containing 0.2 ul of 
the purified protein at a concentration of 15 mg ml ' and 0.2 pl of well solution 
containing 0.2 M sodium chloride, 0.1 M Bis-Tris, pH 5.5 and 25% (w/v) poly- 
ethylene glycol 3,350. Crystals of about 100 tm in length appeared in 3 days. The 
MYC3(44-238)-Jas22/4” complex crystals were grown at 20°C in sitting 
drops containing 0.2 ul of the purified complex proteins at a concentration of 
15mgml ' and 0.2 ul of well solution containing 0.2M magnesium chloride, 
0.1M Tris, pH 8.5 and 30% (w/v) polyethylene glycol 4000 for 3 days. The 
MYC3(44-238)-Jas22)4”! complex crystals were grown at 20°C in sitting 
drops containing 0.2 ul of the purified complex proteins at a concentration of 
15mgml' and 0.2 pI of well solution containing 3.5M sodium formate. 
Crystals of about 801m in length appeared in 2 days. The MYC3(5-242)- 
Jas22)4”° fusion protein crystals were grown at 20°C in sitting drops containing 
0.2 tl of the purified fusion proteins at a concentration of 15 mg ml‘ and 0.2 ul of 
well solution containing 0.2 M magnesium nitrate, 20% (w/v) polyethylene glycol 
3350. Crystals of about 100 um in length appeared in 3 days. All crystals were 
serially transferred to the well solution with 20% (v/v) ethylene glycol before flash 
freezing in liquid nitrogen. 
Data collection and structure determination. Data collections were performed 
at sector 21-ID (LS-CAT) beam lines of the Advanced Photon Source synchrotron 
using single native MYC3(5-242) crystals, MYC3(44-238)-Jas22)4” complex 
crystals and Se-Met-substituted MYC3(44-238) crystals at different wavelengths. 
All diffraction data were processed using XDS”, and merged using Aimless of the 
CCP4 suite*". Initial phasing was tried by using the SAD method based on anom- 
alous diffraction of sulfur atoms as previously described*. S-SAD phasing using a 
combined data set of 11 native MYC3(5-242) crystals collected at 1.77 A was not 
successful, probably because of the non-isomorphism of the individual crystals. To 


solve the phase problem, the Se-Met-substituted MYC3(44-238) crystals were 
prepared and a single data set was collected at a peak wavelength of 0.9787 A 
(Extended Data Table 1). Se-SAD phasing was performed by using the Phenix 
Autosol program. Five out of six selenium atoms were found with a figure of 
merit (FOM) value of 0.41. The Phenix autobuild program generated an initial 
model of 286 residues with a value of Rwor/Rfree of 0.35/0.40. Further model 
improvements were performed using Coot” and refined using the Refmac pro- 
gram in CCP4 (ref. 34) to a final model with an R factor of 0.21 and an Rgrec 
factor of 0.26. The MYC3(5-242) apo structure, MYC3(5-242)-Jas224”, 
MYC3(44-238)-Jas224” and MYC3(44-238)-Jas22/471 complex structures 
were solved by using the molecular replacement program Phaser’ with the 
Se-Met MYC3(44-238) structure as a search model. The model for the «1'/a1 
helices for the MYC3(5-242) structure, for the Jas22)4”° peptide for the 
MYC3(44-238)-Jas224” and MYC3(5-242)-Jas22/4” complex structures and 
for the Jas22/471 peptide for the MYC3(44-238)-Jas22/471 complex structure were 
built on the basis of the electron density maps using Coot. 
AlphaScreen luminescence proximity assays. In vitro interactions between 
MYC3 and Jas peptides or MED25 fragments were assessed by luminescence 
proximity AlphaScreen (PerkinElmer) technology as described previously”**””. 
Reactions contained 50 nM His6Sumo-MYC3 protein bound to nickel-acceptor 
beads and 50nM synthesized biotinylated Jas peptides bound to streptavidin 
donor beads (Extended Data Fig. 1d and 2b) or 50nM His6Sumo-MED25- 
ACID protein bound to nickel-acceptor beads and 50 nM biotin-MYC3(44-238) 
bound to streptavidin donor beads (Fig. 5a). The results were based on an average of 
three experiments with standard errors typically less than 10% of the measurement. 
For the competition assay (Fig. 5b), non-biotinylated Jas22)4”° peptide was 
added into the reaction at concentrations of 0, 5, 10, 100, 300, 1,000, 3,000, 
10,000, 30,000 and 100,000 nM. The results were based on an average of three 
experiments with standard errors typically less than 10% of the measurement. 
Mutagenesis. Site-directed mutagenesis was performed using the QuickChange 
Method (Agilent). Mutations for all plasmid constructs were confirmed by 
sequencing. 
Yeast two-hybrid assays. Most of the constructs for yeast two-hybrid assays used 
in this study were described previously***’. The coding sequences of full-length 
MYC2, MYC3 and MYC4, and MYC N-terminal and C-terminal fragments were 
PCR-amplified and cloned into pGilda (Clontech). PCR-based deletions were 
conducted following the manufacturer’s protocol (Stratagene). Detailed protocols 
for yeast two-hybrid assays were described previously**. 
Plant materials and growth conditions. Arabidopsis plants used in this study 
were previously described” or were ordered from the Arabidopsis Biological 
Resource Center (www.arabidopsis.org). Arabidopsis seeds were stratified for 
3 days at 4 °C before planting. The soil-grown plants were placed in a controlled 
growth chamber at 23°C with a 12-h-day (80 mols” 'm~* cool-white fluor- 
escent light)/12-h-night cycle. 
Transient expression in tobacco leaves and Arabidopsis protoplasts. For tran- 
sient expression in tobacco leaves, coding sequences of JAZ9, MED25 and MYC3 
were cloned into pJYP003, pJYP011 and pJYP018 (J.Y. and S.Y.H., unpublished 
observations), respectively, to create p35S:3XHA-JAZ9, p35S:3X Flag-MED25 
and p35S:YFP-MYC3 fusion constructs, which were transfected as previously 
described”’. Protein extracts were immunoprecipitated with an anti- YFP antibody 
and analysed by western blot with anti-HA, -Flag or - YFP antibodies as previously 
described”. For transient expression in Arabidopsis mesophyll protoplasts, MYC3 
(no stop codon) with or without its promoter (the 2-kb sequence upstream of the 
start codon) was PCR-amplified and cloned into pENTR-D/TOPO vector 
(Invitrogen) to create entry clones. Then, the MYC3 or pMYC3:MYC3 inserts 
were introduced into pSAT4A-DEST-Venus or pBR-DEST-Venus (J.Y. and 
S.Y.H., unpublished observations) to create the p35S:MYC3-YFP or pMYC3- 
MYC3-YFP constructs. The JAZ2 promoter (the 2-kb sequence upstream of its 
start codon) was cloned into pBR-Gus (J.Y. and S.Y.H., unpublished observations) 
to create pJAZ2:GUS reporter constructs. The transient expression assays using 
pBS-35S-Luc as transfection control followed a published protocol®®. A 35S::LUC 
reporter construct was co-transfected as a control. GUS activities were normalized 
to the luciferase activity. 
Root growth inhibition assay. Arabidopsis wild-type (Col-0) and med25 (pft1-2; 
SALK_129555)” seedlings were used for the root growth inhibition assay. Seeds were 
surface-sterilized, stratified at 4°C and germinated on half-strength Murashige and 
Skoog agar plates containing 1 1M, 3 .M or 10 1M MeJA or 0.1% DMSO (control). 
Plates were placed vertically in a growth chamber (16h light/8h dark light cycle, 
100 #Es 'm ® light intensity) for 10 days before pictures were taken, and root 
lengths were measured with Image] software (http://rsbweb.nih.gov/ij/). 
Bacterial infection. Pseudomonas syringae pathovar tomato (Pst) DC3000 infec- 
tion assays in Arabidopsis plants were performed as described previously*’. Briefly, 
5-week-old Arabidopsis plants were dip-inoculated with Pst DC3000 bacterial 
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suspension (1X 10° colony-forming units per millilitre, 0.025% Silwet-L77). 
Bacterial growth was determined 3 days after inoculation. 

HDX mass spectrometry. HDX of MYC3(5-242) in 20 mM Tris pH 8.0, 200 mM 
ammonium acetate, 1 mM EDTA and 7% glycerol was performed at 4 °C using an 
automated system described previously”. Briefly, protein was incubated in a D,O 
buffer for a range of exchange times from 10s to 1h before quenching the deu- 
terium exchange reaction with an acidic quench solution (pH 2.4) containing 
3M urea and 1% TFA. All mixing and digestions were performed on a LEAP 
Technologies Twin HTS PAL liquid handling robot housed inside a temperature- 
controlled fridge. Protein digestion was performed in-line with chromatography 
using an immobilized pepsin column. Mass spectra were acquired on a Q Exactive 
hybrid quadrupole-Orbitrap mass spectrometer (ThermoFisher Scientific). 
Percentage deuterium exchange values for peptide isotopic envelopes at each time 
point were calculated and processed using Workbench software”. 

The unit of measurement represented as a single value is the percentage deu- 
terium incorporation, which is determined by initially calculating the intensity 
weighted average (centroid) of all spectral data within defined m/z limits. The 
percentage deuterium incorporation is then determined by comparing the result to 
defined minimum (0%) and maximum (100%) m/z values for each peptide. The 
minimum and maximum m/z values are determined using experimentally 
observed undeuterated and fully deuterated controls. 

The data representing each peptide are reduced to single values in the following 
manner. For each sample, the three individual time-point replicates of the percent- 
age deuterium incorporation (done in triplicate) at each time point are averaged. 
The mean of these values is then presented as a single value, representing the overall 
change in deuterium incorporation for the sample. The first number in brackets is 
the representation of the propagation of error for the sample, which is determined by 
a root mean squared approach using the standard deviations from each individual 
time point. The second number in brackets is the charge state of the detected peptide. 

Extended Data Fig. 4a shows cumulative peptides fragments that were detected 
in tandem mass spectrometry. Shorter fragments (four to ten residues) provide 
higher resolution information than longer peptides. Therefore, they supersede 
longer fragments (more than ten residues) and were used to manually overlay 
onto the atomic structure as in the case of Fig. 2b. No subtraction was used. The 
peptide set used for structural overlay contained the shortest fragments from the 
complete data set (all peptides). 

Data analysis, statistics and experimental repeats. The specific statistical 
method used, the sample size and the results of statistical analyses are described 
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in the relevant figure legends. No statistical methods were used to predetermine 
sample size. Sample size was determined on the basis of experimental trials and in 
consideration of previous publications on similar experiments to allow for con- 
fident statistical analyses. The experiments were not randomized. The investiga- 
tors were not blinded to allocation during experiments and outcome assessment. 
All reported results were reproduced in at least three independent experiments. 
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Extended Data Figure 1 | Mapping of the JAZ9-MYCS3 interface. a, b, Yeast 
two-hybrid analysis of the interaction between LexA-JAZ9 and BD42(AD)- 
MYC3 constructs. Simplified diagrams of MYC3 (a) and JAZ9 (b) proteins are 
shown on top. Blue yeast colonies indicate a positive interaction between 

two proteins. The experiment was repeated three times with same results. 

c, Sequence alignment of the Jas motif of the 12 A. thaliana JAZ proteins. The 
N-terminal five amino acids that are unwound in the crystal structure of the 
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COlI1-ligand-JAZ co-receptor complex” are indicated by a red line on top of 
Jas“! Asterisks denote amino acids conserved in all of the sequences, 
colons denote similar amino acids. d, Interaction between purified His6Sumo- 
MYC3 N-terminal proteins and biotinylated Jas motif peptides by AlphaScreen 
luminescence proximity assay (n = 3 technical replicates; error bars, s.d.). 
***Significant differences (P < 0.001) compared with the no-Jas control by 
Student’s t-test. The experiment was repeated three times with similar results. 
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Extended Data Figure 2 | Interactions of MYC2 and MYC4 with Jas motifs | N-terminal proteins and biotinylated JAZ peptides (n = 3 technical replicates; 
of JAZ8, JAZ9 and JAZ12. a, Yeast two-hybrid assays between MYC error bars, s.d.). ***Significant differences (P < 0.001) compared with no-Jas 
N-terminal proteins and full-length JAZ9. The experiment was repeated three _ control by Student’s t-test. The experiment was repeated three times with 
times with same results. b, AlphaScreen assay between His6Sumo-tagged MYC similar results. 
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Extended Data Figure 3 | Arrangement of secondary structure elements structure elements overlaid on the sequence alignment of MYC2, MYC3 and 
in MYC3(5-242). a, Rainbow colour scheme of MYC3(5-242) in two MYC4 N-terminal proteins. Note that «1’ (solid line) is a loop with partial helix 
orientations, from blue (N terminus) to red (C terminus). b, Secondary character and is connected to «1 by a 90° kink. 
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Extended Data Figure 4 | Surface accessibility and structural dynamics overlaid onto the MYC3(5-242) apo structure. Peptides corresponding to the 
of MYC3(5-242) revealed by HDX. a, HDX heat map of MYC3(5-242). The JID helix were not resolved (no HDX information, grey colour), preventing a 
colour bar indicates the percentage deuterium exchange. Three experimental _ definitive assessment of the dynamics of the JID helix in solution. 

repeats were performed for each HDX time point. b, HDX heat map 
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Extended Data Figure 5 | B-factor presentations of the four crystal 
structures. The B-factor indicates the dynamic mobilities of different resolved 
parts within the structure. The thicker the lines and the warmer the colour, 
the higher is the mobility. Other than two linker regions (linker), the three 
helices that can occupy the JID helix have the highest B-factors in all four 
structures. The difficulties in crystallizing the MYC3(5-242)-Jas224” 
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complex are therefore probably caused by its high conformational flexibility 
due to the presence of all three dynamic helices as well as the unfolding of 
the «1'/o1 helix. Covalent fusion to MYC3(5-242) probably stabilizes the 
conformational flexibility of Jas22)*” and the complex. Note that the presence 
of the «1'/o1 helix does not interfere with the ability of MYC3 to bind the 
JAZ peptide (compare MYC3(5-242) and MYC3(44-238) in Fig. 5b). 
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Extended Data Figure 6 | Effects of mutations in MYC3 on the interactions between JAZ and MYC3 proteins in yeast two-hybrid assays. The development of 
blue yeast colonies indicates the positive interaction between two proteins. The experiment was repeated three times with same results. 
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Extended Data Figure 7 | MYC3(44-238) in complex with Jas22)“”' and 
Jas22)4” and phenotypes of the Arabidopsis med25 mutant. a, b, MYC3- 
(44-238) in complex with Jas22’““!(amino acids 200-221; grey) overlaid with 
MYC3(44-238) in complex with Jas22/4”*(amino acids 218-239; pink). 

c, Arabidopsis med25 mutant (pft1-2) plants are less susceptible to P. syringae 
pathovar tomato (Pst) DC3000 than Arabidopsis wild-type (Col-0) plants. 
Disease symptoms (chlorotic lesions; upper panel) and bacterial population 
(lower panel) of Arabidopsis wild-type and med25 mutant (pft1-2) plants 3 days 
after dip-inoculation with Pst DC3000 at 1 X 10° colony-forming units per 
millilitre (n = 4 biological replicates; error bars, s.e.m.). ***Significant 
difference (P < 0.001) in bacteria population between Col-0 and med25 mutant 
plants, as determined by two-tailed t-test. The experiment was repeated five 
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times with similar results and the images presented are representative of five 
repeats. d, The med25 mutant plants are less sensitive to jasmonate-induced root 
growth inhibition than wild-type plants. A representative picture (upper panel) 
and percentages of root growth inhibition (lower panel) of 10-day-old wild-type 
and med25 mutant (pft1-2) Arabidopsis seedlings after treatment with 0.1% 
DMSO (control), 1 4M, 3 LM or 10 uM MeJA (n = 15 biological replicates; error 
bars, s.e.m.). Triple asterisks (***) with different colours indicate the significant 
differences (P < 0.001) between Col-0 and the med25 (pft1-2) mutant with the 
same concentration of MeJA treatment, as determined by two-way ANOVA 
with Bonferroni post-test. The experiment was repeated four times with 
similar results and the images presented are representative of four repeats. 
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Extended Data Figure 8 | A simplified diagram of the core components of 
the jasmonate signalling cascade. a, In the resting stage, jasmonate response 
gene expression is restrained by a family of JAZ transcriptional repressors. 
JAZ repressors bind and inhibit the MYC family of transcription factors 
through (1) direct inhibition and (2) recruiting TOPLESS (TPL) co-repressors 
either directly or through the NINJA adaptor. TPL in turn recruits histone 
deacetylases/methyltransferases (not shown) to repress gene expression 
through chromatin remodelling. b, In response to stress or developmental cues, 


plants synthesize jasmonate-Ile, which serves as molecular glue to facilitate 
the formation of a co-receptor complex between JAZ and COI1. The 
formation of the COI1-JAZ co-receptor complex leads to ubiquitination and 
proteasome-dependent degradation of JAZ repressors. c, JAZ-free MYCs 
interact with the MED25 subunit of the Mediator complex and recruit RNA 
polymerase II (not shown) to the promoters of jasmonate-responsive genes. 
Components examined in this study are coloured. 
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Extended Data Table 1 


PDB code 
Data collection 
Space group 
Cell dimensions 


a, b, c (A) 


a,b, g (°) 


Wavelength 
Resolution (A) 
Reym OF Rmerge 
I/sl 
Completeness (%) 


Redundancy 


Refinement 

Resolution (A) 

No. reflections 

Ruwork ! Riree 

No. molecules 

per asymmetric unit 

No. atoms 
Protein 

Ligand/peptide 
Water 

B-factors 
Protein 
Ligand/peptide 
Water 

R.m.s deviations 
Bond lengths (A) 
Bond angles (°) 


Native 
MYC3(5-242)* 


4RRU 


P3,21 


85.4, 85.4, 53.7 


90, 90, 120 


1.0782 
50-2.1 

0.054 (0.97)" 
18.2 (2.2)' 
100 (100) 
7.3 (7.4) 


50-2.1 
12780 
0.234/0.277 


1 


60.0 
106.3 
57.4 


0.009 
1.34 


SeMet 
MYC3(44-238)* 


4RQW 


P2,2,2, 


57.3, 76.6, 85.9 


90, 90, 90 
Peak 
0.9787 
50-2.2 

0.050 (1.22)° 
20.6 (1.8) 
99.9 (99.9) 
8.2 (8.4) 


50-2.2 
18734 
0.214/0.263 


62.5 
94.2 
65.4 


0.008 
1.27 


Native 
MYC3(44- 
238)+JAZ9 
complex* 
4RS9 


P3,21 


85.8, 85.8, 60.0 


90, 90, 120 


0.9786 
50-1.95 

0.057 (1.31)" 
23.7 (2.0)' 
100.0 (100.0)' 
12.0 (10.8)" 


50-1.95 
17922 
0.200/0.234 


1338 
155 
137 


46.9 
70.5 
58.3 


0.006 
1.01 


#The X-ray diffraction data were obtained from a single crystal. *Values in parentheses are for the highest-resolution shell. 


X-ray data collection and refinement statistics for MYC3 structures 


Native 
MYC3(44- 
238)+JAZ1 
complex* 
4YZ6 


P3,21 


86.2, 86.2, 59.4 


90, 90, 120 


1.078 
50-1.95 

0.049 (0.95)" 
29.7 (2.6)' 
100.0 (100.0)' 
12.2 (12.4) 


50-1.95 
17940 
0.184/0.217 
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Native 
MYC3(5- 
242)+JAZ9 
complex* 
4YWC 


C222, 


59.1, 110.6, 161.8 


90, 90, 90 


0.9787 
50-2.4 

0.139 (1.08)" 
10.3 (2.0)" 
99.9 (100.0)" 
8.2 (8.3) 


50-2.4 
21103 
0.239/0.295 


Extended Data Table 2 | Main interacting residues between JAZ9 and MYC3 


R223 


$226 


L227 


R229 


F230 


L231 


K233 


R234 


K235 


Distance 


3.1A 
4.0A 
4.0A 


2.6A 
2.8A 


3.8A 
4.1A 
3.9A 
4.2A 


2.9A 
3.4A 


3.9A 
45A 
41A 
3.8A 
3.9A 


41A 
4.0A 
4.5A 
4.4A 


3.2A 
3.4A 


2.9A 
3.3A 
2.9A 
3.0A 
3.0A 


3.0A 


Interaction 


H-bond 
VdW 
VdW 


H-bond 
H-bond 


VdWw 
Vdw 
Vdw 
Vdw 


lonic 
ionic 


VdW 
VdW 
VdW 
VdW 
VdW 


Vdw 
Vdw 
Vdw 
VdwWw 


H-bond 
H-bond 


H-bond 
H-bond 
lonic 
lonic 
H-bond 


H-bond 


M155 
M155 
we92 


D94 
we92 


L125 
L125 
L152 
M155 


D94 
D94 


VEY 
von 
E148 
F151 
F151 


L125 
N126 
1122 
1122 


Y97 
Y96 


E143 
E143 
E148 
E148 
E142 


N126 
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*Green, TAD amino acid; cyan, JID amino acid; bold, residues whose mutation to alanine compromised the JAZ—MYC interaction in yeast two-hybrid and AlphaScreen assays. VdW, Van der Waals bond. 
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Real-time observation of the initiation of RNA 
polymerase II transcription 


Furqan M. Fazal'*, Cong A. Meng**, Kenji Murakami*+*, Roger D. Kornberg® & Steven M. Block'* 


Biochemical and structural studies have shown that the initiation 
of RNA polymerase II transcription proceeds in the following 
stages: assembly of the polymerase with general transcription fac- 
tors and promoter DNA in a ‘closed’ preinitiation complex 
(PIC)'’; unwinding of about 15 base pairs of the promoter DNA 
to form an ‘open’ complex**; scanning downstream to a transcrip- 
tion start site; synthesis of a short transcript, thought to be about 
10 nucleotides long; and promoter escape. Here we have assembled 
a 32-protein, 1.5-megadalton PIC® derived from Saccharomyces 
cerevisiae, and observe subsequent initiation processes in real time 
with optical tweezers®. Contrary to expectation, scanning driven by 
the transcription factor ITH’ involved the rapid opening of an 
extended transcription bubble, averaging 85 base pairs, accompan- 
ied by the synthesis of a transcript up to the entire length of the 
extended bubble, followed by promoter escape. PICs that failed to 
achieve promoter escape nevertheless formed open complexes and 
extended bubbles, which collapsed back to closed or open com- 
plexes, resulting in repeated futile scanning. 

Optical tweezers have been used in studies of transcript elongation 
by RNA polymerase II (Pol II)'*"’° with the use of a ‘dumbbell’ con- 
figuration, consisting of two beads held in separate optical traps, con- 
nected by a segment of DNA. One bead was directly attached to Pol II, 
and the other was attached to the opposite end of the template DNA, 
minimally around 3 kilobases (kb), for traversing the distance between 
the traps. For the study of transcription initiation, we adapted a similar 
approach to the PIC. Pol II, biotinylated for attachment to one bead, 
was assembled together with transcription factors on DNA, and end- 
labelled with digoxigenin for attachment to the other bead. The tran- 
scription factors consisted of six general transcription factors (GIFs 
including TATA-binding protein (TBP), transcription factor IIB 
(TFB), TFIE, TFIIF, TFITH and TFIIA) and Sub1 (yeast homologue 
of human PC4), which is thought to stabilize the PIC’® (Extended Data 
Fig. 1). The DNA contained the SNR20 (also known as LSR1) pro- 
moter fused to an additional 2.7 kb length of DNA, sufficient to sepa- 
rate the beads by roughly the wavelength of light. The SNR20 promoter 
bore a mutation resulting in one, rather than many, transcription start 
sites (TSSs). Two versions of the promoter were used: the otherwise 
wild-type promoter with the single TSS located 91 base pairs (bp) 
downstream of the TATA box (referred to as SNR20* long), and a 
deleted version, in which the TSS was situated 31 bp downstream 
(SNR20* short), a distance characteristic of metazoan transcription. 
Both versions of the promoter have been characterized in bulk tran- 
scription experiments’’. The PIC, assembled without the peripheral 
component TFIIK'*”’, was mixed with a 25-fold molar excess of PIC 
without the additional 2.7 kb DNA, to achieve an overall PIC concen- 
tration sufficient to avoid dissociation. A twofold excess of TFIIK was 
added, and dumbbells were formed by reaction of the PIC mixture 
with anti-digoxigenin-coated and avidin-coated beads (Extended Data 
Fig. 2). 


In a dumbbell carrying digoxigenin on the upstream end of the 
DNA, the tension exerted by the optical trap tends to pull the poly- 
merase downstream, in the same direction as transcription, resulting 
in an ‘assisting-load’ assay (Fig. 1a). Transcription was initiated by the 
addition of saturating concentrations of all four ribonucleoside tripho- 
sphates (rNTPs). Force was maintained during measurements by the 
use of an optical force clamp, as the location of the polymerase on 
DNA was tracked with sub-nanometre-level precision. Transcription 
was signalled by movement of the polymerase (Fig. 1b, SNR20* short 
promoter) at 29 + 3bp s | (n=10, mean+s.e.m.), consistent with 
elongation rates observed in previous assays of transcription under 
similar assisting loads'*’*. To confirm the identification of the moving 
molecules as transcription elongation complexes, we raised the force 
instantaneously to a value (10-15 pN) that, in our experience, can only 
be sustained by a stable elongation complex (Fig. 1b, black arrows). 
Only 2-3% of dumbbells gave rise to transcription elongation com- 
plexes, whereas in biochemical assays, about 18% of PICs gave rise to 
runoff transcripts (Extended Data Fig. 3a). The lower efficiency of 
initiation in the single-molecule system was attributable to the much 
lower protein concentrations used (<1 nM, at least tenfold lower than 
biochemical assays; Extended Data Fig. 3b). 

The onset of polymerase movement at a rate characteristic of tran- 
script elongation was preceded by an almost instantaneous jump 
(Fig. 1b, red arrows), occurring around 15 + 2s (n = 10) after the addi- 
tion of rNTPs. No such movement was observed in the absence of 
rNTPs. An interpretation consistent with all other available information 
is that the polymerase draws downstream DNA into the active centre 
region to form an extended unwound region, or transcription bubble, 
and then lurches forward after DNA rewinding and bubble collapse 
(Fig. 1c). Because one bead is attached to the upstream end of the 
DNA and the other bead to the polymerase, there is no change in the 
distance between them when DNA is drawn in from the downstream 
side. Only once the polymerase is released from its point of attachment 
at the upstream edge of the bubble (promoter escape), and DNA 
rewinds to collapse the bubble, does the distance between the beads 
change and lengthen (Fig. 1c). The size of the jump at the transition 
to a transcription elongation complex was 70 + 13 bp (n = 9, mean + 
s.e.m.), with a minimum of 32-34 bp and a maximum of about 140 bp. 

The jump after promoter release and the corresponding transition 
to a stable elongation complex are notable in two further respects. First, 
the bubble does not collapse completely at the jump, because about 
15 bp remain unwound in the Pol II active centre as a transcription 
bubble from the time of open complex formation until the end of 
transcript elongation”®. Therefore, the entire length of the unwound 
region in this initial transcribing complex (ITC) is, on average, 
approximately 85 bp (70 + 15bp). Second, because this experiment 
was performed with the SNR20* short promoter (in which the TSS is 
located 31 bp downstream of the TATA box), transcription was 
initiated within the open complex, and the nascent transcript extended 
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Figure 1 | Transcription initiation in assisting-load assay. a, A dumbbell 
tether (not to scale; subunit Sub1 not shown) is formed between beads (blue) 
held in separate optical tweezers (pink), with one attached to Pol II (green) in 
the PIC via an avidin-biotin linkage (yellow, black), and the other to upstream 
DNA via a digoxigenin linkage (brown, black). As transcription proceeds 
(green arrow indicates direction), the tether extension increases. 

b, Representative records of Pol II elongation (dashed grey line denotes the TSS 
location) after promoter escape (red arrows), with the applied force often 
stepped up after ~10s to confirm elongation (black arrows; the associated 


to the downstream end of the unwound region before the jump 
(Fig. 1c). Evidently, a transcript averaging 70 nucleotides, and as long 
as around 140 nucleotides, is synthesized before promoter clearance 
and the transition to a stable elongation complex. 

To observe the process occurring before the jump, presumed to 
involve the drawing of downstream DNA into the Pol II active centre, 
we moved the point of attachment of the bead to DNA from the 
upstream to the downstream end of the template. In this configuration, 
external forces applied to polymerase tend to pull it upstream, opposite 
to the direction of transcription, resulting in a ‘hindering-load’ assay 
(Fig. 2a). Fewer dumbbells (<2%) yielded transcription elongation 
complexes in this assay, consistent with previous studies showing that 


a_ Hindering-load assay c Hindering 


Distance (bp) 


Elongation 
complex 


vertical discontinuity is due to tether stretch). c, Ss]2 subunit in TFIIH (orange) 
unwinds the template (blue) and non-template (green) strands of DNA around 
the active site (white circle, black outline) of Pol II (beige), and creates a 
transcription bubble (open complex formation). RNA synthesis while still 
bound to the promoter results in DNA scrunching at the upstream edge of 
transcription bubble (ITC), which re-anneals after Pol II enters productive 
elongation (elongation complex). Distances measured by the assays are 
indicated (double-headed red arrows). Not to scale. 


hindering loads reduce polymerase processivity. Dumbbells that elon- 
gated did so at 17-18 bp s | (Fig. 2b), consistent with previous mea- 
surements of Pol II transcription under hindering loads'*”’. In contrast 
to the assisting-load assay, there was no jump at the transition to a 
stable elongation complex, but rather a gradual distance change 
(Fig. 2b). The distance change was the same size as the jump in the 
assisting-load assay, and was observed for both forms of the SNR20* 
promoter. In the case of the SNR20* long promoter, which initiates 
transcription downstream (Fig. 2, dashed line), the distance change 
reflects open complex formation and scanning to the TSS; in the case 
of the SNR20* short promoter, which initiates transcription in the 
open complex (dashed line, Fig. 2b), the distance change reflects open 


Figure 2 | Transcription initiation in hindering- 
load assay. a, By attaching one bead to 
downstream DNA, a hindering-load assay was 


Closed developed (not to scale, Sub1 not shown). Scanning 

complex and subsequent transcription events (green arrow 
indicates transcription direction) resulted in 
tether-extension decrease. b, Records illustrating 
Pol II escape and elongation, with a velocity of 
~17-18bps— ' collected on SNR20* short (red, 

Open left panel) and SNR20* long (blue, right panel) in 

complex the presence of rNTPs. The dashed black line 
denotes the TSS at +1; the solid grey line marks 
position of the predicted ~24 bp distance change 
after open complex formation. c, Distances 

ITC measured by the assays are indicated (colour 
scheme same as Fig. 1c). 

Elongation 

complex 
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complex formation, extension of the bubble, and transcription in the 
ITC (Fig. 2c). 

In the single-molecule system, it was possible to investigate not only 
PICs that initiated transcription, but also those that failed to do so. 
Approximately 20% of dumbbells showed movement downstream 
without the initiation of transcription (Fig. 3 and Extended Data 
Fig. 4). The downstream movement began with an initial distance 
change of about 24 bp, which was often punctuated, at the temporal 
resolution of our assay (~0.1 s), by brief (<2 s) pauses (Fig. 3a, b). After 
the initial 24bp, movement continued to a maximum of about 
150-200 bp downstream (Fig. 3e), until either the bubble collapsed 
back to a distance of 24 bp or 0 bp, or the PIC dissociated, as evidenced 
by rupture of the dumbbell (Fig. 3b, black arrows). Bubble collapse was 
often followed by a repetition of the downstream movement. 

The downstream movement was processive, and was observed under 
all three conditions examined: SNR20* short with rNTPs (n = 40) 
(Fig. 3a-d); SNR20* long with rNTPs (n=19) (Extended Data 
Fig. 4a); and SNR20* long with dATP (n= 15) (Extended Data 
Fig. 4b). No movement was observed in the absence of rNTPs or 
dATP. There were no significant differences in either processivity 
or velocity in the three conditions. Combining these data yielded a 
pause-free velocity for downstream movement of 36+1bps ' 
(n = 24). Because the velocity was unchanged when only dATP was 
present (and no rNTPs), it must have been produced by TFIIH activity, 
and not by polymerase. The extent of the downstream movement was 
94+ 36bp (mean + s.d.) (Fig. 3e). TFIIK, which contains a kinase 
responsible for phosphorylation of the carboxy-terminal domain of 
Pol II (refs 18, 19), could be omitted without effect. There was no 
change in the absence of TFIIK in either the distance (92 + 33 bp, 
n = 34) or the velocity (36 + 2bp s ', n=4) of downstream move- 
ment. In about 20% (n = 15 out of 74) of dumbbells that displayed 
TFIIH activity but failed to initiate transcription, there was a transition 
to a ‘fast state’, characterized by a velocity of 61 +2 bps ' and down- 
stream movement through hundreds of base pairs (Fig. 3d). The trans- 
ition to the fast state was irreversible, and must reflect action of the 
TFIIH helicase subject to little or no restraint by other GTFs or Pol II. 
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Figure 3 | Records of TFIIH motion for the SNR20* short construct with 
rNTPs present in hindering-load assay. a, Initial transition from the closed 
(0 bp) to open (~24 bp predicted distance change, grey line) complex. 

b, Scanning behaviour, with occasional bubble collapse to the closed complex 
(blue record) or open complex (green record). ¢, Infrequent slips in the records 
(n = 10 of 74) were observed (red arrows). d, Occasional irreversible transi- 
tion from scanning (shaded region) toa highly processive fast state, occurring at 
a distance of 130 + 21 bp (n = 9, mean + s.e.m.). In all records, black arrows 
mark tether breakage, probably due to PIC dissociation. e, Histogram of TFITH 
processivity, with a peak between 40 and 140 bp (n = 78). 
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Biochemical evidence for movement driven by the TFIIH helicase 
was obtained by exonuclease III footprinting of the PIC (Extended 
Data Fig. 5). Extended regions of unwound DNA were previously 
revealed by KMnO, reactivity of yeast and Drosophila promoters in 
studies of transcriptional activity in vivo’. The regions were similar 
in all cases, extending from about 20-60 bp downstream of the TATA 
box, with the TSSs of the Drosophila promoters near the upstream edge 
of the unwound region nearest the TATA box, and the TSSs of the 
yeast promoters near the downstream edge furthest from the TATA 
box. It had been thought that transcribing polymerases with 15-bp 
bubbles, at different locations on individual promoters and then 
revealed collectively in the KMnO, analysis, gave the appearance of 
an extended bubble. Our examination of single molecules suggests 
instead that an extensive unwound region is a characteristic of every 
individual promoter, rather than some collective property. We 
obtained similar results for a yeast promoter in which the TSS was 
located near the downstream edge of the unwound region, and for the 
same promoter in which the TSS was moved to the upstream edge, as 
in Drosophila and other metazoans. 

The formation of the unwound region is not a consequence of 
transcription, but instead of TFIIH action, because it occurs in the 
presence of dATP without rNTPs, and because it is observed even 
when the TSS is located at the downstream end of the unwound region. 
TFIIH must act continuously to maintain the unwound region, con- 
sistent with previous biochemical studies showing a requirement of 
TFIIH to prevent premature arrest of ITCs**”*. It is not known what 
determines the length of the unwound region, nor where the unwound 
DNA resides in the complex. In the case of bacterial polymerase tran- 
scription, approximately 10 bp of DNA drawn into the active centre 
before the transition to elongation are thought to be accommodated by 
a ‘scrunching’ mechanism**”’. The possibility of scrunching in eukar- 
yotic transcription has previously been considered”, but no evidence 
obtained. Because of rotation in the direction of unwinding by TFITH, 
there is unlikely to be associated torsional strain, as presumed to occur 
in the bacterial system. The location and conformational state of the 
approximately 85 bp of DNA unwound in the Pol II PIC thus remain 
open questions. 

Although most PICs (~80% of PICs in biochemical assay conditions, 
and 97-98% of PICs under single-molecule assay conditions) fail to yield 
stable elongation complexes, they are not inert. About 20% of the dumb- 
bells showed downstream movements of polymerase along DNA in the 
hindering-load assay. There was often an initial movement of 24 + 2 bp 
(mean = s.d.), which we attribute to the formation of an open complex, 
on the basis of previous studies of the PIC” (see Methods). 

After open complex formation, downstream movement continued 
for a total distance of 94 bp, on average, with bubble collapse back to 
the open or closed complexes and repetition of downstream move- 
ment, before either final dissociation of the PIC or rupture of the 
dumbbell (Extended Data Fig. 6). The movement of 94 bp in the hin- 
dering-load assay is noteworthy for two reasons. First, it is in excellent 
agreement with the results of the assisting-load assays, in which a jump 
of about 70 bp was observed before the onset of transcription elonga- 
tion. This jump is attributed to collapse of an extended bubble, leaving 
the original open complex in place, and 70 + 24 bp (open complex) = 
94 bp. Second, the distance of downstream movement in the hinder- 
ing-load assay ranged from about 30 to 150 bp, that is, about 37-157 bp 
from the TATA box, similar to the distribution of TSSs in yeast, which 
are located 40-120 bp downstream from the TATA box. Therefore, 
downstream movements in the hindering-load assay may be attributed 
to TSS scanning, which precedes the onset of transcription elongation, 
as observed in the assisting-load assay, and the initiation of transcrip- 
tion in yeast in vivo**”. 

It is commonly noted that TSSs for yeast promoters are spread over a 
wide region, rather than concentrated near the TATA box, as in metazo- 
ans. Nevertheless, as discussed above, our evidence for an extended 
bubble in the yeast PIC corresponds well with extended bubbles mapped 
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by KMnO, reactivity in Drosophila’. Moreover, TATA-less promoters, 
which predominate in metazoans as well as in yeast, have several TSSs 
spread over regions of 50-100 bp in human cells*®. Our findings from 
single-molecule studies of the yeast PIC are therefore likely to hold true 
for other eukaryotes as well, including metazoans. 


Online Content Methods, along with any additional Extended Data display items 
and Source Data, are available in the online version of the paper; references unique 
to these sections appear only in the online paper. 
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METHODS 

Single-molecule optical-trapping assay. The 29-subunit yeast PIC containing 
biotinylated Pol II was assembled on SNR20* promoter DNA fused to a 2.7kb 
DNA ‘handle’. The DNA handle allowed us to form tethers in both the hindering- 
load and assisting-load assays by incorporating a digoxigenin tag via PCR at either the 
downstream or upstream end of the DNA, respectively. The constructs containing 
the handle were mixed in a 1:25 molar ratio with identical PICs assembled on the 
same promoter DNA, but without the handle, such that the overall concentration of 
the PIC was 100 nM. PIC complexes assembled on DNA lacking a handle sequence 
are unable to form tethers, and instead serve to increase the overall concentration of 
PIC by mass action. The resulting mixture was incubated with twofold excess of 
TFIIK'*'*!? at room temperature for 20 min to form the complete 32-subunit PIC. 
This complete PIC was incubated with both anti-digoxigenin-coated 0.9-um dia- 
meter beads and avidin-coated 0.6-1m diameter beads, resulting in tethers being 
formed with the digoxigenin-containing handle at one end and the biotinylated Pol II 
in the PIC at the other end. On completion of this step, the concentration of PIC was 
~25 nM. In this latter step (and all subsequent steps), the buffer used (50 mM HEPES, 
pH 7.5, 80 mM potassium acetate, 5 mM MgSO,, 10 mM dithiothreitol (DTT), 10% 
glycerol) was always supplemented with 250 nM TFIIB and 250 nM TFIIE to stabilize 
and maintain the PIC. The assembled dumbbell tethers were flowed into a ~5 ,1l flow 
chamber, and rinsed with ~10 pl of additional buffer to remove excess beads. Each 
bead that formed the tether was held in a separate optical trap, allowing controlled 
load to be applied on the dumbbell by using an active force clamp as previously 
described". Force uncertainties were estimated at ~15% owing to variations in bead 
size and systematic calibration errors. The temperature on the trap was estimated to 
be 26 + 1 °C (mean = s.d.)”’. Single tethers were identified as described*’, and held at 
~4pN constant force for ~15-20 s, after which transcription buffer (50 mM HEPES, 
pH 7.5, 80mM potassium acetate, 10 mM magnesium acetate, 10 mM DTT, 10% 
glycerol, 1 U of RNaseOUT (Life Technologies), 250 nM TFIIB and 250 nM TFIIE) 
containing either 1.6 mM (2X) NTPs or 1.6mM (2X) dATP, was flowed into the 
flow cell while holding the tether at the same force. In the absence of nucleotides, the 
dumbbells could be held at about 4 pN without breakage for extended periods. An 
oxygen-scavenging system (8.3 mg ml * glucose (Sigma), 46 U ml" glucose oxidase 
(Calbiochem), 94 U ml! catalase (Sigma)) was used to reduce photodamage. Data 
were collected at 2 kHz sampling frequency, filtered at 1 kHz with an 8-pole Bessel 
filter (Krohn-Hite) and boxcar averaged over a 20-point window to provide posi- 
tional feedback to an active force clamp at a rate of 100 Hz. The resulting data was 
analysed using Igor Pro (Wavemetrics). 

Protein purification. TFIIA, TFIIB and TBP were expressed in bacteria, and TFIIE, 
TFIIF and TFIIH were isolated from yeast*’””?**. Biotinylated Pol II was isolated as 
previously published’. For the expression of recombinant Sub1 (refs 16, 35-37), the 
Escherichia coli Rosetta2 (DE3) strain (Stratagene) was transformed with pCold II 
vector (Clontech) containing the SUB1 (also known as TSP1) gene fused to sequence 
encoding a C-terminal His6-tag. The cells were grown in 2X YT media at 30 °C, and 
induced with 1mM isopropyl-1-thio-B-p-galactopyranoside (IPTG) for 16h at 
15°C. The cells were then lysed by sonication in a lysis buffer (20 mM Na/K-phos- 
phate buffer, pH 7.5, 500mM potassium acetate, 10 mM imidazole, 0.1% Triton 
X-100, 1 mM DTT, 1 mM benzamidine, 100 LM leupeptin, 10 11M pepstatin A and 
1mM PMSF), and was eluted by a gradient of 10-500 mM imidazole in a buffer 
containing 20 mM Na/K-phosphate buffer, pH 7.5, 300 mM potassium acetate and 
5% glycerol. The eluent was further purified using HiTrap Heparin 1 ml (GE health- 
care) and CaptoSP ImpRes (GE Healthcare). 

DNA constructs for single-molecule experiments 

SNR20* short (—62/+96) promoter sequence for hindering-load assay. The 
sequence of the non-template DNA strand was as follows: 5’-GCCGTTTCC 
GATGGGCCACTCGGTGAAAACATATAAAAAGGGCTCTACATTCATTTT 
TTTTAAATGCCCACGAATCTCTTTTCCTTTCGGGTGGATCAAGTGTAGT 
ATCTGTTCTTTTCAGTGTAACAACTGAAATGACCTCAATGAGGCTCATT 
ACC-3'. 

SNR20* long (—122/+96) promoter sequence for hindering-load assay. The 
SNR20* long differed from the SNR20* short by containing an additional 60 bp of 
DNA between the TATA site and the start site. The sequence of non-template 
strand of the longer promoter is shown, with the bases underlined not present in 
the shorter promoter: 5’-GCCGTTTCCGATGGGCCACTCGGTGAAAACATA 
TAAAAAGGGCTCTACATTCATTTTTTCATCGATGAGTACTTTACTTGTT 
ATCAGATTTATTCATTTTGTTTCTACTTGTTTITITTTTTAAATGCCCACG 
AATCTCTTTTCCTTTCGGGTGGATCAAGTGTAGTATCTGTTCTTTTCAGT 
GTAACAACTGAAATGACCTCAATGAGGCTCATTACC-3’. 

SNR20* short (—62/+636) promoter sequence for assisting-load assay. The 
SNR20* short sequence for the assisting-load assay contained an additional 540 bp 
of DNA downstream derived from the wild-type SNR20 gene relative to the 
hindering-load assay, as is shown below: 5'-GCCGTTTCCGATGGGCCACT 


CGGTGAAAACATATAAAAAGGGCTCTACATTCATTTTTTTTAAATGCCC 
ACGAATCTCTTTTCCTTTCGGGTGGATCAAGTGTAGTATCTGTTCTTTTC 
AGTGTAACAACTGAAATGACCTCAATGAGGCTCATTACCTTTTAATTTG 
TTACAATACACATTTTTTGGCACCCAAAATAATAAAATGGACGGGAAG 
AGACTTTTTAAGCAAGTTGTTTTCCGCTAATGTCAGGTCTCACTACTTT 
TTGCTGCTATTTTTCTTCGCTCATGGTTTCTTCATAAGGCGTTTTTATG 
ATGGTTTTTCGAAATTGGTTTTTGAGACGACGGAATCACGAATTCTGG 
ATCCTTGCTCAAGGTTATTGTTTTTGTTTTCTTCTGGTITGTTTTCTATTT 
TCTTTTTTTTAGCTTTCTGTTTCTCCCTTAGTTTGGCTTTTTGCTTCATA 
CTCTTCCCTGTCTTTCCGAGCCGTTTATGTCCAACGCGGGATTTGGTTT 
TTCTTTATCGATGGGAAGAAATGGTGCTATAGTAGGTTGGGAGATAAT 
ATTTATGGTATGGGGTGCTAGTGCGGATGGGGCGCTCTTATTGTTGAT 
TTCTTCGCTCGTCTTCTTTTTCTGGTGGCGCTGCAAGAGGAAGTTTTTC 
GACTTTGTTATGATTTTTGGTTTGCAAGGAAAGGTGTCTTAC-3’. 
Generation of DNA templates. A 2.7 kb DNA fragment that served as a ‘handle’ in 
our single-molecule assay was amplified by PCR from the plasmid pRL702 as prev- 
iously described’, and was subcloned into the pDrive Cloning Vector (Qiagen). To 
obtain the three DNA constructs used in this study, three different plasmids were 
constructed, each containing the handle adjacent to one of the following promoter 
sequences: SNR20* short (—62/+636), SNR20* short (—62/+96), and SNR20* long 
(—122/+96). For the assisting-load assay, the handle was located upstream of the 
SNR20* short (—62/+636) promoter sequence. For the hindering-load assay, the 
handle was situated downstream of the SNR20* short (—62/+96) or SNR20* long 
(—122/+96) promoter sequence. Regions containing the promoter and the handle 
were amplified by PCR, using a 5’-digoxigenin labelled primer (IDT) that anneals to 
the end of the handle, such that PCR products carry a digoxigenin tag on the 
upstream end of DNA for the assisting-load assay, and on the downstream end of 
DNA for the hindering-load assay. The generated PCR products were loaded onto 
TSKgel DEAE-5PW (Tosoh), eluted by a gradient of 0.1-1 M NaCl in a buffer 
containing 20 mM Tris, pH 7.5, and 2mM DTT, and concentrated up to 5-10 WM 
using Vivacon 500 5K MWCO (Vivaproducts), yielding ~0.15-0.3 nmol from 
4-8 ml PCR reaction. 

PIC assembly and isolation. The PIC was isolated as previously published* with 
minor modifications. First, 0.15 nmol of SNR20* promoter DNA with or without 
the 2.7 kb handle was separately mixed with 1.5 nmol of TFIIB, 1.5 nmol of TFIIA, 
0.8 nmol of TBP, 0.65 nmol of TFIIE, 0.24 nmol of TFIH-ATFIK, and 0.8 nmol of 
Sub] in 90 ul of buffer (500) (20 mM HEPES, pH 7.6, 5mM DTT, 2mM MgSO, 
and 5% glycerol, with the mM concentration of potassium acetate in parentheses). 
The mixture was dialysed into buffer (300), buffer (220), buffer (150), and then 
combined with 0.2 nmol of biotinylated Pol II-TFIIF complex. The mixture was 
further dialysed into buffer (80), and loaded onto a 10-40% (v/v) glycerol gradient 
containing 20 mM HEPES, pH 7.6, 5mM DTT, 2mM magnesium acetate and 
80mM potassium acetate, and was ultra-centrifuged for 4h at 48,000 r.p.m. 
(Beckman SW60 Ti rotor). The presence of the 2.7-kb handle did not significantly 
affect the efficiency of assembly of the PIC. 

In vitro transcription assay. Transcription assay was performed as described 
before’. In brief, 1.5 pmol of DNA fragment was combined with 3.7 pmol of 
TFIIB, 3.7 pmol of TFIIA, 1.5pmol of TBP, 3.7pmol of TFIE, 1.5 pmol of 
TFIIH, 1.5 pmol of Pol II, 2.1 pmol of TFIIF, 2.5 pmol of Sub1 in 5 pl of buffer 
(300) (50mM HEPES, pH 7.6, 300 mM potassium acetate, 5mM DTT and 5% 
glycerol), diluted with 5 ul of buffer (10) (20mM HEPES, pH 7.6, 10 mM pot- 
assium acetate, 5 mM MgSO, and 5mM DTT), and incubated for more than 1h 
on ice. The transcription was initiated by adding an equal volume of buffer con- 
taining 20 mM HEPES, pH 7.6, 10 mM potassium acetate, 5mM MgSO,, 10 mM 
magnesium acetate, 1 U of RNaseOUT, 5mM DTT, 1.6mM ATP, 1.6mM GTP, 
1.6mMCTP, 40 uM UTP and 0.83 uM [a-*?P] UTP (2.5 Ci). The reaction was 
stopped after 15 min by adding 185 ul of stop buffer (300 mM sodium acetate (pH 
5.5), 5mM EDTA, 0.7% SDS, 0.1 mg ml! glycogen, 0.013 mg ml! of proteinase 
K (Sigma)). Transcripts were precipitated by adding 700 il of ethanol, dried and 
analysed by a denaturing 4-12% acrylamide gel. 

Exonuclease footprinting. Exonuclease footprinting was performed as described 
before®. SNR20* long (—122/+147) was amplified by PCR in 2 ml reaction using 
*?P-_labelled upstream primer (5’-GCCGTTTCCGATGGGCCACTC-3’) and 
downstream primer (5'-CCATTTTATTATTTTGGGTGCC-3’), and was puri- 
fied by electrophoresis in a 2% agarose gel. The labelled DNA (1.5 pmol) was 
incubated with 3.7 pmol of TFIIB, 3.7 pmol of TFIIA, 2.0 pmol of TBP, 3.7 pmol 
of TFIIE, 2.0 pmol of TFIIH, 2.0 pmol of Pol II-TFIIF complex, and 3.8 pmol of 
Sub] in 5 pl of buffer (300) (50 mM HEPES, pH 7.6, 300 mM potassium acetate, 
5mMDTT and 5% glycerol), then combined with 5 ul of buffer (30) (50mM 
HEPES, pH 7.6, 5mMMgSO,, 30mM potassium acetate and 5mM DTT), 
and incubated for more than 1h at 4°C. The reconstituted PIC was combined 
with an equal volume of 2X NTP buffer (1.6mMNTP(s) or 1.6mMdATP, 
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50mM HEPES, pH 7.6, 5mM MgSO,, 30mM potassium acetate, 5mM DTT, 
10mM magnesium acetate and 5 U of RNaseOUT) and incubated for 4 min at 
30°C. Exonuclease III digestion was performed with 5-10 U of the exonuclease 
(NEB) for 9min at 30°C, and was stopped by adding 185 1 of stop buffer 
(300 mM sodium acetate, pH 5.5, 5mM EDTA, 0.7% SDS, 0.1 mg ml! glycogen, 
0.013mgml~* of proteinase K (Sigma), 0.5mgml~’ salmon sperm DNA 
(Invitrogen)). DNAs were precipitated by adding 700 ul of ethanol, dried and 
analysed by a denaturing 6% acrylamide gel. 

Data analysis of single-molecule records. Transcription in the expected dir- 
ection resulted in a decrease in extension of the DNA tether in the hindering load 
assay, and an extension increase the assisting-load assay. In both geometries, the 
change in extension of the DNA tether, which is a function of the applied force, was 
converted to a distance on the template in bp (~0.313 nm bp’ at ~4 pN load"). 
The data acquired was smoothed in software by applying a low-pass filter to it (end 
of pass band = 0.1 Hz; start of reject band = 50 Hz, number of coefficients = 500). 
To align the records so that motion was defined to start at 0 bp, the mean value of 
the region ~1-2 s of positional data before the start of processive motion was set to 
be the starting (0 bp) distance. The velocities of Pol II and TFIIH reported were 
obtained by dividing the observed change in distance by the time over which the 
molecule moved. For both Pol II and TFIIH, these velocities were calculated over a 
region of at least 50 bp that did not contain any resolved pauses (>0.1-0.2s). As 
some of the pauses were short lived, especially owing to TFIIH motion, we did not 
have sufficient information to characterize the pause lifetimes or distributions, nor 
could we reliably use previous techniques" to get a pause-free velocity. We were 
also unable to determine pause-free velocities from the distributions of instant- 
aneous velocities**, as there was often insufficient data to obtain reliable fits. For 
the TFIIH records, velocities of the scanning state and fast states, when observed, 
were occasionally calculated by examining different regions of the molecules that 
were separated in time by a relatively sharp transition (change within 0.5s) in 
velocity. To estimate the processivity of TFIIH during scanning, we only included 
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molecules that travelled at least 30 bp, sufficient to move beyond the noise thresh- 
old and extend past the distance of the open complex (24 + 2 bp). 

Calculating expected open-complex distance. Biochemical studies have shown 
that the minimal distance of a TSS from the TATA box is about 30 bp, and that 
transcription begins at this location in the initial open complex’. In the structure of 
the closed PIC”, this location in the promoter DNA is about 80 A from the 
nucleotide addition site of Pol II. Open complex formation must bring the TSS 
to the nucleotide addition site, which therefore requires drawing 80 A of down- 
stream DNA into the Pol II cleft. In the structure of a transcribing complex”®, all 
but 3 bp of downstream DNA are double-stranded, and 80 A corresponds to 24 bp 
of dsDNA, the same as the initial movement observed in the hindering-load assay. 
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Extended Data Figure 1 | The 29-component PIC assembled on SNR20* 
short promoter. a, PIC excluding the kinase domain (TFIIK) was assembled 
on SNR20* short (adjacent to the 2.7-kb downstream handle sequence) and 
sedimented on a glycerol gradient; fractions were analysed by SDS-PAGE. 

b, The results from fraction 12, annotated in detail, indicate that all PIC 
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components were retained, confirming that the complex reconstituted fully 
from the component proteins. The subunit(s) of Pol II are labelled in black, 
TFIIF in blue, TFIIE in magenta, TFITH in orange, TFIIA in cyan, TFIIB in red, 
TBP in light green, and Sub1 in dark green. TFIIK (3-subunits) was later added 
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2.7kb DNA handle 


Extended Data Figure 2 | Schematic diagram showing assembly of digoxigenin-coated beads (black and brown) via a 2.7-kb DNA handle. PICs 
dumbbells, in cross-section. PICs were attached to one bead via biotin-avidin not involved in tether formation served to increase the local concentration of 
linkages (yellow). To form dumbbell tethers, the other end of a small fraction | PIC components. 

of the PICs (4%) had digoxigenin linkages that could be tethered to anti- 
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Extended Data Figure 3 | Run-off transcription under single-molecule 
assay conditions. a, Isolated PICs (0.1 pmol), formed on the SNR20* short 
promoter fragment fused to the transcription template (covering the region 
—62/+636), and attached to a 2.7 kb DNA handle, was combined with 
increasing amounts of PICs assembled on the SNR20* short promoter, but 
without the handle, hereafter referred to as PIC (—62/+96). These constituents 
were incubated with an equal volume of a 2X NTP solution (10 ml) containing 
1.6mM ATP, 1.6mM GTP, 1.6mM CTP, 40 mM UTP, and 0.83 mM [«-??P] 
UTP (2.5 Ci). The resulting transcripts were analyzed by gel electrophoresis. 
PICs fused to the DNA handle failed to support transcription alone (lane 1), but 
transcription activity was restored (red arrow) when a 4-fold (lane 2), 8-fold 
(lane 3), 12-fold (lane 4), or 15-fold (lane 5) excess of PIC (—62/+96) 

was added to the reactions. In lane 6, the reaction contains 1.5 pmol PIC 
(—62/+ 96). The 96-nt run-off transcription from PIC (—62/+96) is indicated 
(black arrow). A 25-fold excess of PIC (—62/+96) was used for single-molecule 
assays (Extended Figure. 2). b, 1.5 pmol aliquots of PIC (—62/+96) were 
introduced into different volumes of transcription buffer, such that assayed 
concentration of PIC varied from 37 nM to 4.5 nM. Transcription efficiency 
(run-off band, black) decreased with PIC concentration from ~18% to 

just 2-3%. The low concentrations used in single-molecule assays (<1 nM) 
could not be assayed directly using gels, but we expect that the transcription 
efficiency is correspondingly low. 
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Extended Data Figure 4 | Records of TFIIH scanning on SNR20* long with either the PIC dissociates (black arrows), or the bubble collapses to the closed 


rNTPs or dATP. a, b, Just as for SNR20* short (Fig. 3), the longer promoter _ (blue and green records) or open (grey line) complex and TFIIH moves again. 
shows TFIIH scanning with both rNTPs (a) or dATP only (b), after which The dashed line indicates the position of the TSS (+1). 
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Extended Data Figure 5 | Exonuclease III footprinting assay of the PIC on 
SNR20*long. In the absence of nucleotides in vitro, PIC complexes bound to 
the SNR20* long promoter produced barriers to exonuclease III digestion 
located ~50 bp downstream of the TATA box (about — 40 nucleotides from the 
TSS, black arrows). These barriers depended on the presence of TFIIH and also 
TFIIE, which interacts with TFIIH. After the addition of dATP, the barriers 
disappeared, and the bands at pause positions were intensified between 
positions —30 and +30 (~60-120 bp downstream of the TATA box, bracket). 


©2015 Macmillan Publishers Limited. All rights reserved 


SNR20* long 


o— 
- 
pe 


closed 


OC 


TSS 
scanning 


TFIIH 
fast state 


LETTER 


SNR20* short 


a 
~~’ 
> 


closed 


OC 


ITC 


TFIIH 
fast state 


Extended Data Figure 6 | The transcription initiation pathway for SNR20* 
long (left) and SNR20* short (right) promoters. Left, a model for the 
initiation pathway on the SNR20* long promoter. States starting from the top: 
Pol II (beige) with attached GTFs (blue) and Ssl2 (orange) binds in its ‘closed’ 
form to the promoter element upstream of the TSS (arrow) on the DNA 
template (green and blue lines). Positions of the enzyme active site (open white 
circle) and TATA box (closed black square) are indicated. Unwinding by 
TFIIH produces an open complex (OC) that leads to bubble formation. 
Arrival of the open complex at the TSS owing to scanning, driven by TFIIH, 
leads to the formation of an extended bubble (dashed lines indicate the 
speculative position of single-stranded DNA). If the complex fails to recognize 


the TSS, it can be driven beyond it by TFIIH, resulting in a ‘fast state’ that 
produces no RNA but advances at roughly twice the normal rate (black box; see 
text). When Pol II recognizes the TSS, it begins transcription of RNA (red line), 
corresponding to the ITC. Formation of the ITC leads to bubble collapse, 
followed by the loss of GTFs and transition to the elongation complex (EC). 
Right, corresponding model for the initiation pathway on the SNR20* short 
promoter. Similar states as for SNR20* long. In this case, the open complex does 
not need to scan for the TSS, which is found within its DNA footprint. As a 
consequence, the ITC can form and begin RNA synthesis once the active site 
has recognized the TSS. A longer segment of RNA can thereby be produced 
before the transition to the elongation complex (EC). 
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CORRIGENDUM 
doi:10.1038/nature14609 


Corrigendum: Passenger deletions 
generate therapeutic vulnerabilities 
in cancer 
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Nature 488, 337-342 (2012); doi:10.1038/nature11331 


In this Article, during the preparation of Figures 2d and 3a, we processed 
digital western blot scans to remove duplicate or otherwise irrelevant 
lanes from single-blot images. Although all excisions/mergers originated 
from the same gel, these figure constructions should have been explicitly 
pointed out. Here we present the unprocessed scans (Supplementary 
Information) and amended figures (Figs 1 and 2). Figure 1 of this 
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Figure 1 | This is the corrected Fig. 2d of the original Article, with excision 
indicated. 
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Figure 2 | This is the corrected Fig. 3a of the original Article, with excision 
indicated. 


Corrigendum shows the corrected Fig. 2d, in which a duplicate run of 
cell line D423-MG (lane 2 in the original) was excised between cell lines 
D502-MG and U343 (lanes 1 and 3 in the original) and the ensuing 
halves of the blot were spliced together (lanes 1 and 3 in the original 
blot). This is now indicated by a dashed line. Similarly, Fig. 2 of this 
Corrigendum shows the corrected Fig. 3a, in which for the cell line U87, 
an additional non-targeting short hairpin RNA control (original lanes 7 
and 8) was excised with the remaining halves of the blot and merged, 
which is now indicated by a dashed line. We also note that in the 
published Fig. 3a, lanes 1 and 2 of the original U87 vinculin blot were 
accidentally used as the loading control for shENO2-4 (lanes 9 and 10 of 
the original unprocessed ENO2 blot in the Supplementary Information), 
and lanes 9 and 10 of the vinculin blot should have been used as the 
correct loading control lanes. The correct loading control lanes are now 
shown (Supplementary Information). None of these corrections alter the 
original meaning of the experiments, their results, their interpretation, 
nor the conclusions of the paper. We apologize for any confusion this 
may have caused to the readers of Nature. 


Supplementary Information is available in the online version of this corrigendum. 
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CAREERS 


TILT THE ODDS A physicist uses his skills in 
statistical analysis in finance p.281 


SELF PROMOTION Social media is a powerful 
tool for promoting work go.nature.com/rsejgj 


NATUREJOBS For the latest career 
listings and advice www.naturejobs.com 


EMPLOYEE BENEFITS 


Plight of the postdoc 


As institutions attempt to redefine the postdoctoral position, early-career researchers are 
joining together to wage a battle for proper benefits. 


BY HELEN SHEN 


nna Kalashnikova is a master of 
Awe: The University of California 

(UC), Davis, postdoc arrives at her lab- 
oratory by 8 a.m., knowing that she must finish 
her experiments by 5:35 p.m. every night. After 
along day spent studying how disease-related 
modifications of DNA-packaging ‘histone’ 
proteins are regulated, she must cycle 25 min- 
utes to her son Maxims child-care facility, 
which closes at 6 p.m. sharp. 

For every minute that she is late to pick him 
up, Kalashnikova must pay extra fees. And asa 
single mother supporting her child on a post- 
doc’s salary, there is little wiggle room. Nearly 
half of her monthly income goes towards child 


MMMM 


care, and one-third covers rent and utilities at 
the house she shares with a roommate. “There's 
this constant stress, because if something unex- 
pected happens, wed be in big trouble,’ she says. 

Kalashnikova is one of many postdocs in 
the UC system who are hoping that their cir- 
cumstances may soon improve. They are in the 
midst of negotiating a new contract with the 
university administration — with the current 
contract set to expire on 30 September, the UC 
postdocs’ labour union is pushing for improve- 
ments on many fronts, including salary, career 
development and child-care support. 

The negotiations come at a time of great tur- 
moil for postdoctoral researchers worldwide, 
as academic science faces a critical oversupply 
of postdocs and a shortage of tenured faculty 


positions. And as early-career researchers find 
themselves stuck with low pay and minimal 
benefits for longer periods, postdocs and their 
advocates at several institutions, including 
the University of Maryland (UMD) in Col- 
lege Park and the Howard Hughes Medical 
Institute (HHMI) in nearby Chevy Chase, are 
fighting — with varying degrees of success — 
for greater benefits, and standardized titles and 
rights (see ‘A postdoc by any other name’). 
“These are not new problems, but they are 
perhaps being more acutely felt now. As grant 
paylines remain low, universities are hir- 
ing fewer tenure-track faculty and postdocs 
remain an abundant source of low-cost labour,’ 
says Keith Micoli, chairman of the board of the 
US National Postdoctoral Association in > 
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> Washington DC, a non-profit organization 
that has been advocating for postdocs’ rights 
since 2003 (see Nature http://doi.org/65x; 
2012). “The situations at the Howard Hughes 
Medical Institute, University of Maryland and 
the University of California system are the 
latest examples of challenges facing postdocs,” 
he says. 


BATTLE FOR BENEFITS 

At institutions across the globe, lab heads 
have had to support a ballooning population 
of postdocs with tight, government-funded 
research budgets. Even some major, privately 
funded institutions have recently cut back on 
benefits for postdocs. In September 2014, for 
example, the HHMI angered many postdocs 
when it announced that it would reduce some 
of their long-term benefits beginning in 2015. 

In an e-mail to employees, the institute 
explained that postdocs spend a short time at 
the HHMI and so their benefits priorities often 
differ from those of other employees. 

At the HHMI’s Janelia Research Campus 
in Ashburn, Virginia, postdocs bristled at the 
institute's reasoning. “They're operating under 
the assumption that the postdoc is a short, 
transient position, when the truth is that most 
postdocs are lasting about five years,” says neu- 
roscientist Eric Yttri, co-president of the Janelia 
Association of Research Scientists, which repre- 
sents postdocs and other staff scientists. 

As of this year, postdocs stopped receiving 
retirement contributions from the HHMI—a 
standard benefit given to most other employ- 
ees that equates to 5% of their annual salaries. 
In addition, employees hired after the start of 
2015 no longer receive ‘benefits credits, an 
income supplement that HHMI gave twice a 
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month in the past to help offset health-care 
costs. The credits will continue to accrue for 
employees who were hired before 1 January 
2015, but will remain frozen at 2014 levels. 

In exchange for these cuts in long-term 
benefits, HHMI revamped its employee 
health-insurance pro- 


gramme, introducing “There’s this 
newoptionsthatwere constant stress, 
intended tobe more becauseif 
affordable for post- something 

docs. Nevertheless, unexpected 

some postdocs saw happens, we’d be 


their health-care costs 
rise in 2015 com- 
pared to 2014. Cory Schreckengost, director of 
administrative operations at Janelia Farm, 
declined to disclose specific details about the 
organization's employee benefits and costs, 
but emphasized in an e-mail that “HHMI took 
mitigating steps specifically for postdocs to 
limit the effects of the rising medical care costs 
that impact all HHMI employees”. 

Janelia postdocs raised vocal objections to 
this raft of changes, which in turn brought 
about some concessions. For example, the 
HHMI added a US$1,500 annual supplement 
for postdocs, to be used at their discretion. 
But the institute remained steadfast on cut- 
ting the retirement benefit, citing the same 
practice elsewhere. “Most universities and 
research institutes do not contribute to retire- 
ment accounts for postdoctoral associates, and 
HHMt has chosen to move to this standard,” 
the institute wrote in an e-mail to employees. 

Despite signs of a cool climate for postdocs 
overall, UC postdocs have prevailed over eco- 
nomic challenges in the past. They formed 
their union in 2008 and secured their first 


in big trouble.” 


Molecular biologist Anna Kalashnikova takes time away from the lab to enjoy the outdoors with her son. 


280 | NATURE | VOL 525 | 10 SEPTEMBER 2015 


© 2015 Macmillan Publishers Limited. All rights reserved 


contract with the university in 2010 — all in 
the middle of a state budget crisis. 

Among other advances, the initial five-year 
agreement provided salary increases, retire- 
ment contributions and guaranteed time 
off for holidays and for personal or medical 
reasons. “Of course, there is a lot of room 
for improvement,’ says Anke Schennink, 
president of the United Auto Workers local 
affiliate that represents the UC system's roughly 
6,000 postdocs. 

The union has already made some progress 
in its second round, reaching a preliminary 
agreement with the university on 6 August that 
would secure postdocs’ right to pursue career 
counselling and career-development activities 
on paid time. 

Given the scarcity of tenure-track positions, 
the next step for many postdocs will prob- 
ably involve an exit from academia. “Statis- 
tically, it’s not in any individual postdoc’s 
favour to be completely focused on any one 
career path. Having this time put onto paper 
would basically recognize that a postdoc is 
a training position, and that it’s important 
for us to work on career development,” says 
Jessica Lao, a postdoc at the University of 
California, San Francisco (UCSF), anda mem- 
ber of P(ostdoc)- Value’, a grassroots postdoc- 
advocacy group at UCSE Lao leads the group’s 
efforts to promote UCSF’s career services and 
pilot programmes for postdocs to tour bio- 
technology companies and gain hands-on 
experience in other non-academic careers. 


FAMILY MATTERS 

Child care is another priority. Many graduate 
students across the ten campuses of the univer- 
sity system already qualify for financial support 
for child-care expenses — up to US$900 per 
quarter or $1,350 each four-month semester. 
But postdocs do not have such benefits. 

Also on the bargaining table is salary, a 
perennial hot-button topic. For UC postdocs, as 
for many of their US peers, minimum salaries 
are pegged to guidelines published by the US 
National Institutes of Health. But stipends that 
are equivalent in value can mean vastly differ- 
ent standards of living, depending on where the 
postdoc resides. “California is expensive, if you 
compare it in terms of cost of living around the 
country,’ says Schennink. “We think postdocs 
should receive fair compensation,’ she says. 

The high cost of living in California has 
proved particularly challenging for Abby 
Kroken, a postdoc at the University of 
California, Berkeley, who now spends more 
than 60% of her take-home pay each month 
on housing, compared with 30% when she was 
a graduate student at the Medical College of 
Wisconsin in Milwaukee. 

Kroken had carefully studied housing 
rates and living expenses before coming to 
UC Berkeley in January 2014 to study bacte- 
rial eye infections. And although she thought 
she was prepared, Kroken could not predict 
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IDENTITY ISSUES 


A postdoc by any other name 


What is a ‘postdoc’? In academia, the lack 
of a standard definition can create a host 
of problems for researchers. 

Some postdocs are classified as 
university employees, eligible for standard 
benefits such as health care, child-care 
support and retirement contributions. But 
many fall into a hodge-podge of trainee 
or temporary-worker categories that do 
not qualify for all the benefits enjoyed by 
graduate students, faculty members and 
staff. Often, these ill-defined postdocs lack 
administrative offices that are dedicated 
to their professional development, fair 
treatment and job security. 

But standardizing the postdoc position 
is no simple feat — especially when it 
means extra costs for the institutions and 
the individual investigators that employ 
postdocs. Earlier this year, the University of 
Maryland (UMD) administration in College 
Park ran afoul of its life-sciences professors 
when it tried to eliminate one of two hiring 
categories for postdocs. The now-defunct 
category was a contract position with a 
faculty title and few benefits, which made 
ita less-expensive option for principal 
investigators. Only 15% of UMD postdocs 
fell into this class, but it had commonly 
been used to hire biomedical postdocs. 

“Lots of life-sciences faculty were 
responding to National Institutes of Health 
and National Science Foundation budget 
cuts. They were pinching pennies,” says 
Jonathan Dinman, chair of cell biology and 
molecular genetics at UMD. 

The other category, a non-tenured 
faculty position, provided postdocs with 
standard health and retirement benefits, 
paid medical leave and tuition remission 
for employees and family members — at 
a cost that many biomedical lab heads 
deemed untenable. In a letter to the 


that her husband, who relocated with her, 
would be unable to find work for about 
11 months. Between the high cost of living 
in Berkeley and the couple's student-loan 
obligations, they soon depleted their savings 
and had to borrow money from their parents 
to make ends meet. 

“Tt felt like 'd made a gigantic financial 
mistake in trying to advance my career,’ says 
Kroken. For the first time, after focusing 
exclusively on an academic-research career, 
she began to consider a job in industry. She 
also thought about moving back to Wisconsin. 

But things began to turn around last 
December, when Kroken’s husband found 
work as a technical writer. By following 


university president, more than 130 life 
scientists wrote that forcing them to use 
the latter category would add expenses 
that could not be justified to governmental 
granting agencies, and would lead to 
personnel cuts and decreased productivity 
— amounting to a “death spiral”. 

But others, including UMD astronomer 
Marc Pound, argued in favour of the 
benefits expansion. “Postdocs are kind 
of a silent majority on campus. They 
come here for maybe three or six years 
and move on, and they never really have 
advocates amongst themselves,” says the 
senior research scientist. 

Ultimately, the administration created 
a new classification scheme that started 
on 1 July — one that guarantees all 
postdocs some benefits, but enables lab 
heads to offer a smaller starting benefits 
package for less-experienced postdocs. 
‘Post-doctoral associates’ will receive the 
complete benefits package previously 
offered to most postdocs. ‘Post-doctoral 
scholars’ will get the same benefits, 
except for tuition remission — which had 
tended to be the most expensive and 
unpredictable expense for lab supervisors. 

Postdocs can be hired directly into the 
associates category, but lab heads can also 
choose to hire early-career postdocs at the 
scholars level. After three years, however, 
those postdocs must be promoted to 
associates if their supervisors wish to 
renew their contracts. And after a total of 
six years in either category, postdocs must 
advance to a research-scientist track. 

“It’s still a mandate, but now we’ve got 
three years to adjust and figure out how 
to do it,” says Dinman, a signatory of the 
letter. “It has increased the cost of doing 
business; that’s for sure, but in the end, | 
think the right thing was done.” H.S. 


a strict budget, the couple is now close to 
restoring their previous savings. 

Kroken says that her supervisor has 
given her much-needed encouragement to 
continue pursuing an academic career. But 
despite her improved outlook, she says that 
the past year and a half has underscored just 
how important adequate compensation is for 
her continued professional development. “I 
do want to be a professor, I do like research 
and I even like writing grants,” says Kroken. 
“T don't want to have to leave this career path 
because I can’t afford to do it? = 


Helen Shen is a freelance writer in 
Sunnyvale, California. 
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TRADE TALK 
Fund manager 


Ben Peters is an 
investment director 
at Evenlode Income, 
an independent 
fund-management 
company in 
Chipping Norton, 
UK. He explains 
how his PhD in 
physics helped 

to smooth his transition into the world of 
finance and investments. 


Why did you leave academia? 

Ienjoyed my PhD programme in nanophysics 
and wouldn't have been averse to staying in 
academia. But having to reapply for funding 
every few years didn't much appeal to me, and 
halfway through my programme, I became 
interested in investment management through 
my brother-in-law, Hugh, whom I now work 
with. So after graduating in 2008, I moved into 
this industry. 


How has your PhD work helped you in your 
role as a fund manager? 

Mathematical and statistical-analysis skills 
are highly valued and important in this 
industry. They helped me to get through the 
door. I constructed a method for quantitative 
analysis of companies’ financial information, 
and I use statistical techniques to look at the 
risk in any investment. I also developed a lot 
of soft skills by doing research, particularly 
on collaborative projects. If you're an experi- 
mental scientist, as I was, you have to be flex- 
ible — you might have to change research 
directions and all PhD students learn how 
to organize themselves and react to what is 
going on — it’sa good skill for this field. 


What do you enjoy about your job? 

I’m finding out how the world works. As a 
physicist, I was beginning to understand the 
material world, but this is more about the 
human and economic worlds. The interac- 
tions of people around the globe and what 
they do and how they create value — I find 
that fascinating. As a fund manager, I have 
to figure out how the world works and how 
it might evolve over time, but also accept 
the extremely uncertain nature of economic 
systems. So I have had to develop an invest- 
ment process that ultimately results in action 
— making an investment — while knowing 
that it is a game of tilting the odds, rather 
than one of certainties. m 


INTERVIEW BY JULIE GOULD 


This interview has been edited for length and clarity. 
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Ua SCIENCE FICTION 


NEURAUGMENT, VERB 


Standard United Galactic Treaty Dictionary, 2nd edn, Caelum Univ. Press ( 11 June 2287 ) 


BY FELICIA DAVIN 


neuraugment, v. 
Full entry. All quotations shown. 


Pronunciation: Standard Treaty 
[‘nusagment], N Earth Eng ['‘nsagment] 
Forms: neuraug, neurog 

Etymology: <English neural augmenta- 
tion any technological enhancement to the 
human brain (21st cent.), <ancient Greek 
vedpov nerve (see neuro- comb. form) + -al 
suffix and <Old French aumentacion, see 
AUGMENT, Vv. Compare neuraugmentation Tis 
neuraugmented or neurogged adj. 


1. trans. To enhance the function of the brain 
through biomechanical implants of any type. 


2098 Newz.ly 3 January: Dr Sharma proposes 
to, in her words, “neuraugment” the civil- 
ian public. She acknowledges that previous 
attempts to do so have gone awry, but insists 
that her method is safer and less obtrusive 
than any of her predecessors. “Initially, I 
wanted to help people who were unable to 
retain information,’ she says. “But this could 
beso much more. It could help us learn each 
others’ languages.” 


2108 G. RopriGcuEz Parenting in the New 
Century: It’s all the rage among the rich 
and famous to neuraugment themselves 
and even their young children. Youngsters 
with perfect pitch and eidetic memories are 
common in the climate-controlled enclaves 
of the Pacific Northwest. I once met a six- 
year-old who had read Madame Bovary in 
the original French. Her review? “Boring.” 
Oh, to be six and feel such crushing ennui. 


2132 L. JoHNSON North American Educa- 
tors Quarterly (Dry season): Many teachers 
who cart afford to neuraugment themselves 
have trouble connecting with a generation of 
students for whom it is the norm, and there 
has been much discussion of the future of 
public education. 


2147 Shermer Morning News 30 April: Ms 
Alcantara has refused to neuraugment her- 
self, saying: “I don’t trust it. How do you 
know what they’re really putting in you? I 
don't need anybody else in my head.” She is 
now heading the Natural Brain movement. 


2. trans. To join with one or more individuals 
through a type of neuraugmentation allowing 
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instant mental communica- 
tion, see NEURALINK, n. 


2120 seoul.times 
15 March: Dr Park 
and his collabora- 
tor Dr Leary pro- 
pose to improve 
upon Dr Sharma's 
methods. “It’s great, 
what she did,’ Park 
says. “Incredible, 
really. But we can do 
so much more. Why stop at 
neuraugmenting one brain?” 

Leary picks up right where Park leaves off: 
“Why not neuraugment two brains, or more 
than two? Think of the computational power 
we could access through our neuralink” 


2158 politics/anon 9 September: The United 
Galactic Treaty — a grand name for what's 
barely more than a dozen Earth nations 
and a space station that got a wild hair to 
design its own flag — well, anyway, the Trea- 
ty’s going to recruit citizens by offering to 
neuraugment them for free if they'll join up. 
The whole thing’s doomed if you ask me, as 
they can only link three or maybe four people 
at a time without one of them crashing. 


2163 Galactic Treaty Daily 2 July: Tragedy 
has struck aboard the Jespersen, where five 
young recruits who agreed to neuraugment 
have suffered aneurysms. 


2169 Galactic Treaty Daily 28 December: “It 
is completely safe to neuraug two people,’ the 
doctor insists. “Some people have the consti- 
tution — we don't know why just yet — that 
allows them to neuraug three people. Iam 
confident that more is possible, but the safety 
of our patients is always our first concern” 


2225 A. CHEN Sex and Love in the Neuraug- 
mented Age: Georgia, who has asked that 
I change her name to protect her privacy 
because her parents do not approve of her 
lifestyle, neuraugmented four other people, 
and she loves them all. “I couldnt live with- 
out them,” she tells me. “It sounds so lonely.’ 


2284 L. MARTINEZ Collected Correspond- 
ence: He left me a diamond ring on the 
kitchen counter, attached to a note that 
said: “I read people used to give these to 
each other back in the days of civil marriage 
licences and reading vows and everything 


2015 
© 2015 Macmillan Publishers Limited. All rights reserved 


— thought it was 
cute. Neurog me?” 


DERIVATIVES 
neuraugmented, 
freq. neurogged, 
adj. Possessing brain 
enhancements; inti- 
mate with or insepa- 
rable from another 
person, either literally 
or figuratively linked via 

neuraugmentation. 


2150 ANonymowus Yakt 2 August: 
All these neuraugmented jerks think they’re 
better than us. 


2237 R. Rat Stars Among the Stars 12 Febru- 
ary: “I hate these parties,” the actress sighs, 
taking a sip of champagne imported from 
the surface. The party is silent except for the 
clink of glasses. We are the only two people 
speaking aloud. “Everyone's so neurogged 
and full of themselves. Or full of each other, I 
guess. I just want something real, you know?” 


2278 Galactic Treaty Daily 17 May: The staff 
members agree among themselves that the 
Secretary General and the Chair of Military 
Operations are so neurogged that approach- 
ing one of them is essentially the same as 
approaching both of them, but this was not 
always the case. 


neuraugmentation, n. Any technological 
enhancement to the brain. 


2136 K. Nsonwau A Natural and Technolog- 
ical History of the Brain, epilogue: Dr Sharma 
has been reclusive in her retirement, but 
she graciously invited me into her home 
to speak with me. She offered me tea in her 
tidy succulent garden, and she was so wel- 
coming that I could not resist asking a per- 
sonal question. “Is it true you've never had 
any neuraugmentation?” I said. Dr Sharma 
shrugged. “I never felt the need. Aisha has a 
perfect memory,’ she said, speaking of her 
partner of decades. “So if] forget something, 
Ijust ask her. As for the neuralink, oh, I don't 
know’ She paused to adjust her shawl. “It's 
nice, don’t you think? Talking?” m 


Felicia Davin is a linguist and translator 
in western Massachusetts. Her fiction has 
previously been published in Lightspeed. 
You can find her on twitter @FeliciaDavin. 
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